Quantize Your Deep Learning Model to Run on an NPU

Quantize Your Deep Learning Model to Run on an NPU

Preparing a TensorFlow Model to Run Inference on an i.mx8M Plus NPU

Introduction

In this article, we will explain in this article which steps you have to take to transform and quantize your model with different TensorFlow versions. We are only looking into post training quantization.

We are using the phyBOARD-Pollux to run our model. The phyBOARD-Pollux incorporates an i.MX 8M Plus which features a dedicated neural network accelerator IP from VeriSilicon (Vivante VIP8000).

Image for post

phyBOARD pollux [Image via phytec.de under license to Jan Werth]

Image for post

i.mx8MPlus block diagram from NXP [Image via phytec.de, with permission from NXP]

As the neural processing unit (NPU) from NXP need a fully int8 quantized model we have to look into full int8 quantization of a TensorFlow lite or PyTorch model. Both libraries are supported with the eIQ library from NXP. Here we will only look into the TensorFlow variant.

The general overview on how to do post training quantization can be found on the TensorFlow website.

tensorflow2 tensorflow tensorflow-lite

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Accelerating Tensorflow Lite with XNNPACK

Tensorflow Lite is one of my favourite software packages. It enables easy and fast deployment on a range of hardware and now comes with a wide range of delegates to accelerate inference — GPU, Core ML and Hexagon, to name a few. One drawback of Tensorflow Lite however is that it’s been designed with mobile applications in mind, and therefore isn’t optimised for Intel & AMD x86 processors. Better x86 support is on the Tensorflow Lite development roadmap, but for now Tensorflow Lite mostly relies on converting ARM Neon instructions to SSE via the Neon_2_SSE bridge

Audio Classification in an Android App with TensorFlow Lite

With the launch of TensorFlow Lite, embedding machine learning models in Android apps has never been easier. TensorFlow lite provides a

TensorFlow Lite Model for On-Device Housing Price Predictions

We’re going to discuss how to implement a housing price prediction machine learning model for mobile using TensorFlow Lite. We’ll learn how to train a TensorFlow Lite neural network for regression that provides a continuous value prediction, specifically in the context of housing prices.

Training an Image Classification Model for Mobile using TensorFlow Lite

This article will explain how to reduce the size of an image classification machine learning model for mobile using TensorFlow Lite, in order to make it fit and work on mobile devices. Predicting whether a person in an image is wearing a mask or not. Training an Image Classification Model for Mobile using TensorFlow Lite

A Definitive Guide for Audio Processing in Android with TensorFlow Lite Models

A Definitive Guide for Audio Processing in Android with TensorFlow Lite Models. Explanation with a demo app implementatio. This guide describes how to process audio files in Android, in order to feed them into deep learning models built using TensorFlow.