Accelerating Tensorflow Lite with XNNPACK

Tensorflow Lite is one of my favourite software packages. It enables easy and fast deployment on a range of hardware and now comes with a wide range of delegates to accelerate inference — GPU, Core ML and Hexagon, to name a few. One drawback of Tensorflow Lite however is that it’s been designed with mobile applications in mind, and therefore isn’t optimised for Intel & AMD x86 processors. Better x86 support is on the Tensorflow Lite development roadmap, but for now Tensorflow Lite mostly relies on converting ARM Neon instructions to SSE via the Neon_2_SSE bridge.

There is however a new Tensorflow Lite delegate for CPU-based floating-point computations, XNNPACK, that does feature x86 AVX and AVX-512 optimizations. In this post I’ll walk you through using XNNPACK and show some benchmarks.

#tensorflow-lite #machine-learning #tensorflow #xnnpack

towardsdatascience.com

Accelerating Tensorflow Lite with XNNPACK