Supporting deep learning inference on mobile and edge devices has gained popularity more than ever and we have a greater number of options to choose from when carrying out AI-related development tasks on our little companions than we could have guessed.

Not only is implementing machine learning models—the standard for tasks such as computer vision—faster and easier on mobile devices these days, but the renewed competition between the developers of frameworks supporting them also seems to have ensured that the process itself reaches new heights in terms of performance, flexibility and adaptability.

Not a big surprise considering how edge devices such as smartphones, wearables and IoT devices are omnipresent and tech companies all want in on the mobile ML development front.

Performing deep learning activities directly on mobile devices has many benefits, such as low latency, security, and increased personalization to name a few. In order to make the best of these activities, inference engines specifically optimized for such devices have been cropping up.

A few examples are TensorFlow Lite (Google), Core ML (Apple), PyTorch Mobile (Facebook) and so on. Last year, among other Asian tech firms such as XiaomiBaidu, and Tencent, Alibaba also joined the fray with its offering: Mobile Neural Network (MNN) — an open source deep learning framework built to address the demanding requirements of high-traffic applications such as Taobao. Some of the strengths, benchmarks and design choices of MNN are discussed in this article.

MNN’s role as a deep learning inference engine. (Credit)

The latest from Alibaba

As a lightweight mobile ML framework, MNN offers support for the inference and training of deep learning models specifically for edge devices. MNN is integrated within over 20 apps by Alibaba Inc, such as Taobao, Tmall, Youku, Dingtalk, an Xianyu.

It takes care of various AI-specific tasks such live broadcast, short video capture, search recommendation, searching products by their images, interactive marketing, equity distribution, security risk control, and so on. Touted by Alibaba as being run over a 100 million times per day, MNN is also applied in IoT devices like Cainiao will-call cabinets.

“Compared with general-purpose frameworks like TensorFlow and Caffe2 that cover both training and inference, MNN focuses on the acceleration and optimization of inference and solves efficiency problems during model deployment so that services behind models can be implemented more efficiently on the mobile side.” — Jia Yangqing (VP and AI lead at Alibaba)

MNN tries to address a few key issues that inference engines face on mobile devices. Firstly, most models for mobile devices are trained from well-known frameworks such as TensorFlow, PyTorch, Caffe, etc., and the basic requirement is that these engines must support not only such standard formats but also be scalable to future formats that can be derived from these.

Additionally, MNN also tries to keep up with device diversity on the hardware level (i.e. architectures such as ARM, Adreno, etc.) and software level (iOS/Android/embedded OS). All this must be done keeping in mind the efficiency needed to run high-performance inference on devices using as little memory and energy as possible. Alibaba has made the MNN open-source and accessible for all on GitHub.

An overview of Mobile Neural Network (Credit: Alibaba)


It takes a lot of tweaking and fine-tuning to move from V1 of a mobile-ready model to one that’s ready for production. Fritz AI Studio is designed to streamline processes across the project lifecycle.


What MNN brings to the table

Alibaba has tried their best to put together a universally-compatible inference engine while maximizing its efficiency and going the extra mile by utilizing these features:

  • A mechanism called pre-inference, which performs runtime optimization through online cost evaluation and optimal scheme selection.
  • In-depth kernel optimization by using improved algorithms and data layouts to boost the performance of some widely-used operations.
  • A backend abstraction module to enable hybrid scheduling and keep the engine as lightweight as possible. Integrating MNN into applications only increases the binary size by 400 ∼ 600KB.

We’ll delve further into what these features exactly do in a minute. Additionally, the inference engine also boasts of the following advantages for deep learning activities on the edge:

  • **High performance — **MNN makes full use of the ARM CPU on mobile devices by implementing core computing with optimized assembly code. On iOS, GPU acceleration ensures faster speed than Apple’s native Core ML. Similarly, on Android, OpenCL, Vulkan, and OpenGL are available and fine-tuned for mainstream GPUs (Adreno and Mali). Convolution and transposition convolution algorithms are efficient and stable. The Winograd algorithm significantly enhances symmetric convolutions. According to Alibaba, its speed is 2x for the new ARM v8.2 architecture.

  • **Lightweight — **Optimized for mobile and embedded devices, MNN is easily deployable and comes with no additional dependencies. On iOS, the static library size for ARMv7+ARM64 platforms is about 5MB, while on Android, the core size combined with OpenCL/Vulkan is less than 1MB.

  • **Versatility — **MNN provides support to reuse popular model formats such as TensorFlow, Caffe, ONNX and common neural networks such as CNNs, RNNs, and GANs. The MNN model converter supports 149 TensorFlow OPs, 58 TFLite OPs, 47 Caffe OPs, and 74 ONNX OPs. Besides, different hardware backends are also supported, such as 111 OPs for CPU, 6 for ARM V8.2, 55 for Metal, 43 for OpenCL, and 32 for Vulkan. The software versions compatible are iOS 8.0+, Android 4.3+, and embedded devices with POSIX interface. Hybrid computing is also possible on multiple devices.

#artificial-intelligence #neural-networks #machine-learning #machine-learning-tools #heartbeat #deep learning

Alibaba’s Mobile Neural Network: A deep learning framework for mobile and embedded devices
3.95 GEEK