Earlier this month, Google researchers released a new algorithm called MLP-Mixer, an architecture based exclusively on multi-layered perceptrons (MLPs) for computer vision.
Earlier this month, Google researchers released a new algorithm called MLP-Mixer, an architecture based exclusively on multi-layered perceptrons (MLPs) for computer vision. The MLP-Mixer code is now available on GitHub.
MLP is used to solve machine learning problems like tabular datasets, classification prediction problems and regression prediction problems. Apart from convolutional neural networks (CNN) and attention-based networks (transformers), researchers & developers use MLPs extensively in image processing.
In a recent paper, Google has introduced MLP-Mixer.
*“While convolutions and attention are both sufficient for good performance, neither of them are necessary,” reads the paper, *MLP-Mixer: An all-MLP Architecture for Vision, co-authored by Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Thomas Unterthiner, Lucas Beyer, Xiaohua Zhai, Jessica Yung, Daniel Keysers, Mario Lucic, Jakob Uszkoreit and Alexey Dosovitskiy.
Interestingly, the new model achieves similar results compared to the state-of-the-art models trained on large datasets with almost 3x speed. “When trained on large datasets, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models,” claimed Google AI.
MLP-Mixer constraints two types of layers — one with MLPs applied independently to image patches (‘mixing’ the per-location features), and one with MLPs used across patches (‘mixing’ spatial information).
The image below depicts the macro-structure of Mixer with Mixer layers, per-patch linear embeddings and a classifier head. Mixer layers contain one channel-mixing MLP and one token-mixing MLP, each consisting of two fully connected layers and a GELU nonlinearity. Other components include skip-connections, layer norm on the channels, dropout, and linear classifier head.
The model accepts a sequence of linearly projected image patches as input and maintains the dimensionality. On the other hand, Mixer uses two layers of MLP: channel mixing MLPs and token-mixing MLPs.
Channel-mixing MLPs allow communication between different channels, and they operate independently on each token and rows of the table as inputs. Similarly, the token-mixing MLPs allow communication between various spatial locations (tokens); they operate on each channel independently and take an individual column of the table as inputs. The layers (channel-mixing MLPs and token-mixing MLPs) are interspersed to enable interaction of both input dimensions.
“In the extreme situation, our architecture can be seen as a unique CNN, which uses (1×1) convolutions for channel mixing, and single-channel depth-wise convolutions for token mixing. However, the converse is not true as CNNs are not special cases of Mixer,” explained Google AI.
Convolution is more complex than the plain matrix multiplication in MLPs as it requires an additional cost reduction to matrix multiplication or specialised implementation.
Today, image processing networks typically involve mixed features at a given location or mix the features between multiple locations. For instance, in CNNs, both mixes happen with convolutions, kernels, and pooling, while vision transformers perform them with self-attention.
MLP-Mixer, on the other hand, attempts to do both in a more ‘separate’ fashion and only using MLPs. The advantage of only using MLPs — essentially matrix multiplication — is the simplicity of the architecture and the computational speed.
Also, the computational complexity of the MLP-Mixer is linear in the number of input patches, unlike vision transformers whose complexity is quadratic. Also, the model uses skip connections and regularisation.
The advantages of MLP-Mixer include:
This video is about the difference between the three terms Artificial Intelligence, Machine Learning & Deep Learning. AI vs ML vs DL. AI vs Machine Learning vs Deep Learning | AI vs ML vs DL | Machine Learning Training with Python
We are a Machine Learning Services provider offering custom AI solutions, Machine Learning as a service & deep learning solutions. Hire Machine Learning experts & build AI Chatbots, Neural networks, etc. 16+ yrs & 2500+ clients.
Inside MoveNet, Google’s Latest Pose Detection Model. Let's explore it with us now.
This article will highlight the different techniques used in Machine Learning development. After that, we will focus on the top Machine Learning models examples and algorithms that enable the execution of applications for deriving insights from data.
Experiments with Google is an exciting website where developers creates intuitive experiments based on machine learning