Benchmarking the Raspberry Pi 4

Benchmarking the Raspberry Pi 4

Last year’s release of the Raspberry Pi 3 Model A+ marked the end of an era: the next board, Raspberry Pi Foundation co-founder Eben Upton promised at the time, would be something dramatically different.

Now, a surprisingly short time later, Upton’s promise has been delivered: the Raspberry Pi 4 is a departure from the norm, and the first of a new generation of Raspberry Pi single-board computers. Gone is the old bottleneck of a single shared USB lane for everything connected to the SoC; gone too is the layout which has been with the boards since the Raspberry Pi Model B+.

Although appearing similar at first glance, the new board is slightly larger thanks to ports extending further from the PCB for improved case compatibility, the Ethernet and USB ports have been switched around, the power input is now a USB Type-C connector, and the full-size HDMI output has been swapped out for not one but two micro-HDMI connectors.

Internally, the SoC has been entirely overhauled. As well as getting rid of the bottleneck — the SoC now has around 5Gb/s of external bandwidth — there are USB 3.0 lanes for high-speed connectivity to external storage and accelerators, Cortex-A72 64-bit processing cores, a more powerful VideoCore VI graphics processor — the first Pi ever to use anything other than the VideoCore IV launched with the original Model B — and there’s enough grunt to drive two 4K-resolution displays as well as true gigabit Ethernet connectivity.

It’s benchmarking time.


SoC: Broadcom BCM2711B0 quad-core A72 (ARMv8-A) 64-bit @ 1.5GHz
GPU: Broadcom VideoCore VI @ 500MHz
RAM: 1GB, 2GB, or 4GB LPDDR4–3200 SDRAM (4GB as reviewed)
Networking: Gigabit Ethernet, 2.4GHz and 5GHz 802.11b/g/n/ac Wi-Fi
Bluetooth: Bluetooth 5.0, Bluetooth Low Energy (BLE)
Storage: MicroSD
GPIO: 40-pin GPIO header, populated
Ports: 2x micro-HDMI 2.0, 3.5mm analogue audio-video jack, 2x USB 2.0, 2x USB 3.0, Ethernet, Camera Serial Interface (CSI), Display Serial Interface (DSI)
Dimensions: 88mm x 58mm x 19.5mm, 46g

There’s a lot going on in the new Pi 4, including the first alteration to the Model B layout since the launch of the Raspberry Pi Model B+. Shifting the ports around has slightly increased the board’s footprint, as measured at its widest points including ports, and leaves the majority of cases incompatible. It is, however, measurably lighter than the Raspberry Pi 3 Model B+ at 46g to its predecessors’ 50g — likely aided by the loss of the full-size HDMI port.

On the SoC side, which is now produced on a 28nm process node, the core count is unchanged but the CPU has been shifted to the new Arm Cortex-A72 running at a slightly faster 1.5GHz. It’s the GPU, though, which has seen the biggest shift: every Raspberry Pi in history has used the Broadcom VideoCore IV GPU, whereas the Raspberry Pi 4 switches to a customised variant of the Broadcom VideoCore VI with scanout engine borrowed from the VideoCore V. The result: improved performance and the board family’s first support not only for 4K resolutions but across two dedicated HDMI outputs, switched to micro-HDMI for reasons of space. The shift to a new SoC has also brought with it support for more than 1GB of RAM, with the Pi 4 launching in 1GB, 2GB, and 4GB variants.

On the surface, the networking functionality is unchanged: there’s still 802.11ac Wi-Fi, though an upgrade to Bluetooth 5.0, plus a wired gigabit Ethernet port. Where the older Raspberry Pis have the Ethernet port talk to the SoC via a shared USB 2.0 lane, however, the Pi 4 enjoys a more direct connection without the bottleneck.

Finally, there’s the USB ports. While there are still four full-size ports in total, two of these have been upgraded to USB 3.0 — greatly improving the theoretical bandwidth available to external devices from USB-connected accelerators like the Google Coral Edge TPU to USB storage. A switch to USB Type-C, rated for 3A from the 2.5A micro-USB of the older models, also allows for higher-power devices to be connected without needing a powered hub.

Thermal Benchmark

A more powerful processor typically means more excess heat, something with which the Raspberry Pi family has struggled in the past. The Raspberry Pi 3 Model B, in particular, ran hot; its replacement, the Raspberry Pi 3 Model B+ went quite some way to improving things with a thicker PCB, metal-encased and tweaked SoC, and improved thermal bonding. A thermal camera provides a view of where this heat is generated and how it spreads through the entire Raspberry Pi.

The new Broadcom BCM2711B0 SoC has the same packaging as its predecessor, but is clearly more powerful in both senses of the word: thermal imagery of the board (bottom) shows the SoC, after a ten-minute CPU-focused workload, running noticeably hotter than the Raspberry Pi 3 Model B+ (top) and spreads that heat throughout the board. Peak spot temperatures at the end of the ten-minute run were measured at 62.6°C on the Raspberry Pi 3 Model B+ and 74.5°C on the Pi 4 — both readings, unsurprisingly, centred on the SoC.

It’s an analysis easily verified in use: after just a few minutes, the entire board feels warm to the touch. Start loading it heavily and that warmth becomes uncomfortable; while it’s still entirely possible to use the board without extra cooling, those looking to put one in a case will find active cooling is required to avoid thermal throttling.

Power Draw Benchmark

If something’s producing more heat, it’s guaranteed to be drawing more power. A big reason for the shift from micro-USB to USB Type-C for the Raspberry Pi 4’s power jack comes from being able to qualify it at higher current levels: 3A, up from 2.5A — already half an amp over standard ratings — on the previous jack.

This benchmark, which measures power draw at the wall for every mainstream model of Pi released so far, confirms the hypothesis: more heat means more power. At 3.4W idle — a figure which may come down with post-launch firmware optimisations— and 7.6W under load, the Raspberry Pi 4 is the most power-hungry design the Raspberry Pi Foundation has yet released.

The positioning of the boards, then, remains unchanged: if you need performance, the full-fat Raspberry Pi 4 is the board to get; if you need to balance performance and power draw, the Pi 3 A+ is difficult to beat for its significantly lower idle draw; if power is key, the Raspberry Pi Zero and Wi-Fi-enabled Raspberry Pi Zero W should be top of the list.

Thermal Throttling Benchmark

When a Raspberry Pi’s system-on-chip gets hot, it — just like any other modern semiconductor — takes action to protect itself from harm, by reducing its operating speed in order to cool itself down. Under brief, bursty workloads — browsing the web, say — this throttling doesn’t happen; only in a sustained, heavy workload does it rear its head. In this benchmark the Raspberry Pi 4 is subjected to a ten-minute run of a CPU-centric stress-testing utility, stress-ng, and the temperature and clock speed measured once every second using the SoC’s internal sensors.

The initial ramp-up from the SoC’s idle temperature is rapid, though it must be noted this test took place in an ambient temperature of nearly 25°C. Once the SoC hits 70°C after around 25 seconds, the temperature rise slows; by two minutes it is around 77°C; by three minutes it’s around 81°C. As is to be expected, these readings — taken from an internal sensor located on the GPU portion of the SoC — are higher than the external package temperature measured during the thermal imaging test.

Interestingly, the first thermal throttle operation isn’t captured until around four and a half minutes into the test, and a quick glance at the rest of the graph shows why: where earlier Raspberry Pi models would tend to hit a throttle point and stay there the Raspberry Pi 4 is instead spending as little time as possible at its throttled clock speed of 1GHz, returning to 1.5GHz as quickly as it can. With the measurements being taken at a rate of one per second but the CPU’s frequency switching taking place on a much shorter timescale, it’s likely the Raspberry Pi 4 was throttling earlier but for too short a time to be captured.

This benchmark clearly demonstrates additional cooling is going to be a must-have to maintain top performance for workloads including sustained CPU activity over the four-minute mark— but for those who are using the device as-is, expect to see less sustained throttling than in previous models after the thermal throttle point is reached.

Linpack Benchmark

A synthetic benchmark with a long history, Linpack is a great way to get an idea of peak compute performance. Here, the same implementation of the Linpack benchmark is used across all models to level the playing field; by compiling for a specific model and using a variety of other tweaks, the absolute performance figure for a given Pi can be dramatically increased but at the cost of making any comparison between hardware platforms apples-to-oranges.

The Raspberry Pi 4 might only be 100MHz faster in clock speed, but the move to Arm Cortex-A72 processor cores has had a dramatic effect on its Linpack performance. The board absolutely dominates the table, with its single-precision (SP), double-precision (DP), and NEON-accelerated single-precision (SP NEON, a mode available only on the Raspberry Pi 2 and upwards) scores sitting between three and four times faster than the Raspberry Pi 3 Model B+ and Model A+.

Memory Throughput Benchmark

The CPU isn’t the only thing improved in the Raspberry Pi 4. The switch to a new SoC has unlocked a whole new world of RAM, moving the platform from the long-in-the-tooth LPDDR2 to LPDDR4 in 1GB, 2GB, and 4GB capacities.

The biggest change in memory performance remains the shift away from the single-core BCM2835 of the original Raspberry Pi and compact Pi Zero families, but this benchmark — which performs read and write operations in 1MB block sizes — shows the switch to DDR4 has boosted things nicely.

For many workloads, simply having more RAM — up to four times as much, depending on the variant in question — will have a bigger real-world impact; the fact the RAM is also faster is, in these cases, just an added bonus.

File Compression Benchmark

Synthetic benchmarks are one thing; what’s more interesting is testing real-world workloads. Here a file is compressed using the bzip2 algorithm and the time it takes measured. For Raspberry Pi models with more than one CPU core, the test is repeated using the multi-threaded lbzip2.

As with benchmarks on previous generations, the biggest gain comes from the move from single-core to quad-core architectures; the Raspberry Pi 4’s Cortex-A72 cores, though, accelerate things considerably over older boards.

The Raspberry Pi 3 Model A+ is an interesting result on this test: its performance in the single-threaded portion of the test lags considerably behind the Raspberry Pi 3 Model B+ despite having the same SoC; this can be explained by having considerably less RAM to play with, a key component in efficient compression of larger files.

Image Editing Benchmark

Another example of a real-world workload, this test uses the command-line scripting interface of popular image-editing application GIMP to edit a high-resolution image. It’s a workload which relies on both CPU and RAM performance, while also demanding a large chunk of free RAM — something which penalises models with less than 1GB.

While not as dramatic a difference as the file compression workload, it’s clear from the results the Raspberry Pi 4 offers a measurable improvement in image editing performance over its predecessors.

Here, too, it’s clear the difference in RAM between the Raspberry Pi 3 Model B+ and the smaller, cheaper Model A+ has a discernable impact on performance: because of the high resolution of the image on test, the Model A+ is forced to swap memory contents out to make room — something the 1GB Raspberry Pi 3 Model B+ avoids, and which is definitely not an issue for the 4GB Raspberry Pi 4 on test.

Browser Benchmark

This benchmark should be of interest to anyone thinking of using a Raspberry Pi as a low-power desktop replacement: it tests whether there are enough hardware resources available to make browser-based applications run smoothly. To do so, the Speedometer 2.0 benchmark is loaded into the stock Chromium web browser; once set running, it returns a result in runs-per-minute.

From the results table it’s clear a modern web app running in a modern web browser is not something a single-core processor can really handle. Moving from the fastest of the single-core designs, the Raspberry Pi Zero family, to the quad-core Raspberry Pi 2 has an oversized impact on performance. Here, too, the impact of dropping from the 1GB of RAM available in the Raspberry Pi 3 Model B+ to the 512MB of the Raspberry Pi 3 Model A+ can be seen.

The Raspberry Pi 4, meanwhile, absolutely romps to the top of the leader-board. Where earlier models may have offered acceptable performance for web apps, the new board can definitely hold its head up high as a potential desktop replacement for the casual web user.

Oddly, while the benchmark ran — just about — on an original Raspberry Pi Model A, the Model A+ on test absolutely refused, leading to its absence from the results chart.

Gaming Benchmark

The Raspberry Pi family has long been a favourite of retro gamers, but is strangely overlooked as a platform for first-party titles. Since the Raspberry Pi 2, though, it has boasted support for full OpenGL — rather than the embedded-focused OpenGL ES — hardware acceleration, though its performance has always been constrained by the ageing VideoCore IV GPU. Here real-world gaming performance is measured using the built-in timedemo in the open-source Quake III Arena-based multiplayer shooter OpenArena, running at a 1280x720 (720p, HD) resolution.

Although the Raspberry Pi 2 and Raspberry Pi 3 families share the same GPU, at 720p the CPU performance of the Raspberry Pi 2 throttles the GPU: moving to to any model of the Raspberry Pi 3 range adds an additional 37 percent or so to the achievable frame rate. Moving between models in the Raspberry Pi 3 family, by contrast, does little to the now-GPU-constrained workload, despite the boosted CPU performance.

The Raspberry Pi 4’s VideoCore VI, on the other hand, offers a considerable improvement, boosting the game’s frame rate by nearly 50 percent over the Raspberry Pi 3 Model B+ to a very playable 41 frames per second average. If GPU performance had been holding your projects back, the Raspberry Pi 4 may be the answer.

Results for the single-core models, meanwhile, are not available; the driver which enables true OpenGL hardware acceleration is not officially available on these devices.

GPIO Benchmark

This test sits somewhere between a real-world and a synthetic workload: while addressing the general-purpose input/output (GPIO) header from Python using the gpiozero library is a common workload for a Raspberry Pi, this benchmark looks at things from a worst-case CPU-constrained perspective. A short program simply toggles a pin on and off without pause, and the rate at which the pin toggles is measured using a frequency counter.

There’s no surprise to find the GPIO benchmark heavily tied to CPU performance; what is perhaps surprising is seeing such a dramatic difference from the Raspberry Pi 3 family to the Raspberry Pi 4 — all thanks to the shift to the new Arm Cortex-A72 CPU cores.

Ethernet Benchmark

The resolution, after all these years, of the single-USB-lane bottleneck from the SoC to the rest of the board promises a dramatic improvement in network performance. While the Raspberry Pi 3 Model B+ was the first to include gigabit Ethernet connectivity, the bottleneck prevented it from reaching anywhere near its theoretical maximum throughput — but how does the Raspberry Pi 4 compare?

Ignoring the systems without an Ethernet port at all, this benchmark shows how stagnant the Ethernet performance was until the launch of the Raspberry Pi 3 Model B+: while the original Raspberry Pi Model B sits at the bottom of the table, the Raspberry Pi Model B+, Raspberry Pi 2, and Raspberry Pi 3 are all barely ahead.

The Raspberry Pi 4, by contrast, shows a throughput within spitting distance of the theoretical maximum. Better still, the removal of the USB bottleneck now means Ethernet and USB throughput aren’t linked — a boon for anyone thinking of building a Raspberry Pi-powered network attached storage (NAS) system.

Wi-Fi Benchmark

Ethernet isn’t the only built-in networking available on the Raspberry Pi family. Since the launch of the Raspberry Pi 3 Model B, built-in Wi-Fi — on a radio which also offers Bluetooth and Bluetooth Low Energy (BLE) connectivity — has been available as standard, with the Raspberry Pi 3 Model B+ adding dual-band support. In this test, each model with Wi-Fi capability is tested in an ideal environment: line-of-sight to an 802.11ac router with only one other client, a laptop on a wired connection.

The throughput on the 2.4GHz band is largely unchanged from the Raspberry Pi 3 Model A+ and B+ boards, which offer a boost over the Raspberry Pi 3 Model B which in turn is faster than the Raspberry Pi Zero W, but it’s the 5GHz band where the Pi 4 pulls away from its predecessors. While the difference between 97.6Mb/s on the Raspberry Pi 3 Model B+ and 114Mb/s on the Raspberry Pi 4 isn’t huge, it’s a welcome gain nevertheless.

USB Throughput Benchmark

The low cost and low power draw of the Raspberry Pi family have long made it a logical choice for homebrew network attached storage (NAS) implementations. Sadly, the single shared USB lane and the use of USB 2.0 ports has always put a hard limit on performance — something the Raspberry Pi 4’s two USB 3.0 ports should, in theory, address. Here, a SATA SSD is connected through an adapter to the Raspberry Pi’s USB port — USB 3.0 in the case of the Raspberry Pi 4, USB 2.0 in all other cases — and the fio utility used to benchmark its read and write performance.

The only real surprise here is how static USB performance has been in the Raspberry Pi family: from the original launch model right through to the Raspberry Pi 3 Model A+, USB throughput has been near-identical thanks to the single-lane shared bottleneck to the SoC.

The Raspberry Pi 4 is the first to change that: its read and write throughput are leagues ahead of its predecessors, and approaching — though not quite reaching — the limits of the connected drive itself.

MicroSD Throughput Benchmark

The USB ports aren’t the only storage interface to receive an upgrade on the Raspberry Pi 4: the microSD storage system has been entirely overhauled, adding in double data rate (DDR) support for improved throughput — up to, at least theoretically, double its predecessors. The following benchmark was carried out using a 64GB Samsung Evo Plus microSD XC (SDXC) Class 10/U3 card, officially rated by the manufacturer at 100MB/s read and 60MB/s write.

The shift to a DDR interface has an obvious effect on the throughput achievable with a decent microSD card, though it’s most obvious in the read speed; write, meanwhile, enjoys a smaller but still noticeable performance boost. At around 46MB/s read throughput, one thing is clear: those relying on high-speed storage will be far better off booting the Raspberry Pi 4 from a USB 3.0 external drive than using any size of microSD card. Sadly, they’ll also have to wait: the ability to boot from USB and Ethernet didn’t make it in time for launch, but Ethernet boot will be available in an early update with USB boot to follow on later.

Physical Benchmarks

The Raspberry Pi 4’s layout is slightly tweaked compared to the original Model B/B+ design, though you have to be paying close attention to notice the changes beyond the moved Ethernet port and the new micro-HDMI connectors.

The increased footprint, measured where the board is widest and including its connectors which stand proud of the PCB edge, is marginal and comes from a few-millimetre change in how far the connectors stick out. The change, Upton explains, came about to improve casability — how well the Pi sits in both first- and third-party cases. Combined with the new micro-HDMI, USB Type-C, and swapped USB and Ethernet connectors, though, it means most cases designed for older models won’t be compatible without modification.

The weight, meanwhile, has dropped from the Raspberry Pi 3 Model B+’s chart-topping 50g to a lighter 46g — a figure which still makes the Raspberry Pi 4 the second-heaviest Raspberry Pi in the line-up. As always, anyone with a weight constraint to their project — such as high-altitude ballooning or drone control — should look to the Raspberry Pi Zero family.


There’s no denying the Raspberry Pi 4 is an impressive machine. While the loss of the full-size HDMI port is a disappointment, the fact it’s now possible to drive two displays simultaneously — and at 4K resolution, no less — definitely makes up for it. The switch to a USB Type-C connector for power makes sense, as that’s where the smartphone and tablet market has been going for some time, and the tweaked layout will cause short-term pain in the form of now-obsolete case designs but for long-term gain.

That said, not everything is an improvement. The powerful new BCM2711 SoC really pumps out the heat, and active cooling is more important than ever for projects which put a frequently-loaded Raspberry Pi into an enclosed area. Those using the optional Raspberry Pi Power over Ethernet (PoE) HAT will be sorted here thanks to its in-built fan, while a small heatsink attached to the top of the SoC will improve things still further. The promised 4K video playback is limited to H.265 content, too, while hardware decode for MPEG2, MPEG4, and H.263 has been dropped on the understanding the CPU is powerful enough to decode these formats in software without too much strain.

The cons in no way outweigh the pros, though, and with the Raspberry Pi 4 the Foundation has addressed a laundry-list of niggles and complaints the community has been voicing since the original Raspberry Pi Model B: the USB bottleneck is gone, there are high-speed ports for external devices, options for more than 1GB of RAM for those who need it, and dual-4K-display outputs — 4K60 on one display or 4K30 on two, unofficially extendable to 4K60 across two displays if you don’t mind overclocking the GPU and running the risk of display corruption if you use too many layers for the video scaler’s available bandwidth. Coupled with the release of a new Raspbian Linux distribution, based on Debian ‘Buster,’ the Raspberry Pi 4 is the first to realistically act as a true desktop replacement for a wide swathe of the computer-using populace.

There are additional under-the-hood improvements not addressed in this testing, too. Chief among these is the addition of extra buses: there are four UART serial buses, four SPI buses, and four I²C buses, which will be welcomed by those building more complicated creations. The board’s pulse-width modulation (PWM) functionality has been upgraded, and is no longer shared with the analogue audio output available on the 3.5mm AV jack.

For most users, the Raspberry Pi 4 will become the must-have board in the range; only those with power, size, weight, or budgetary restraints should look at the other models in the family.

Further reading:

Creating a Rogue Wi-Fi Access Point using a Raspberry Pi

Building a Smart Garden With Raspberry Pi 3B+

Building a Circuit communication device using Raspberry PI

Raspberry Pi 4 on the Raspberry Pi 4 - Computerphile

Raspberry Pi 4 on the Raspberry Pi 4 - Computerphile

A quick tour of the Raspberry Pi 4 edited on the Raspberry Pi 4. Dr Steve Bagley gets out his knife.dll to unbox Sean's purchases! ☞ [I created a home IoT setup with AWS, Raspberry...

A quick tour of the Raspberry Pi 4 edited on the Raspberry Pi 4. Dr Steve Bagley gets out his knife.dll to unbox Sean's purchases!

I created a home IoT setup with AWS, Raspberry Pi

Benchmarking the Raspberry Pi 4

The easy way to set up Docker on a Raspberry Pi –

Creating a Rogue Wi-Fi Access Point using a Raspberry Pi

Building a Smart Garden With Raspberry Pi 3B+

Learn Raspberry Pi for Image Processing Applications

Learn Raspberry Pi for Image Processing Applications

New to the newly launched Raspberry Pi 3? Learn all the components of Raspberry Pi, connecting components to Raspberry Pi, installation of NOOBS operating system, basic Linux commands, Python programming and building Image Processing applications on Raspberry Pi. At just $9.

Image Processing Applications on Raspberry Pi is a beginner course on the newly launched Raspberry Pi 3 and is fully compatible with Raspberry Pi 2 and Raspberry Pi Zero.

The course is ideal for those who are new to the Raspberry Pi and want to explore more about it.

You will learn the components of Raspberry Pi, connecting components to Raspberry Pi, installation of NOOBS operating system, basic Linux commands, Python programming and building Image Processing applications on Raspberry Pi.

This course will take beginners without any coding skills to a level where they can write their own programs.

Basics of Python programming language are well covered in the course.

Building Image Processing applications are taught in the simplest manner which is easy to understand.

Users can quickly learn hardware assembly and coding in Python programming for building Image Processing applications. By the end of this course, users will have enough knowledge about Raspberry Pi, its components, basic Python programming, and execution of Image Processing applications in the real time scenario.

The course is taught by an expert team of Electronics and Computer Science engineers, having PhD and Postdoctoral research experience in Image Processing.

Anyone can take this course. No engineering knowledge is expected. Tutor has explained all required engineering concepts in the simplest manner.

The course will enable you to independently build Image Processing applications using Raspberry Pi.

This course is the easiest way to learn and become familiar with the Raspberry Pi platform.

By the end of this course, users will build Image Processing applications which includes scaling and flipping images, varying brightness of images, perform bit-wise operations on images, blurring and sharpening images, thresholding, erosion and dilation, edge detection, image segmentation. User will also be able to build real-world Image Processing applications which includes real-time human face eyes nose detection, detecting cars in video, real-time object detection, human face recognition and many more.

The course provides complete code for all Image Processing applications which are compatible on Raspberry Pi 3/2/Zero.

Who is the target audience?

Anyone who wants to explore Raspberry Pi and interested in building Image Processing applications

To read more:

Machine Learning on the Raspberry Pi

Machine Learning on the Raspberry Pi

3 Frameworks for Machine Learning on the Raspberry Pi ... Stick 2 — Intel's latest USB interface device for Neural Networks, boasting 8x perf over the first stick!

3 Frameworks for Machine Learning on the Raspberry Pi

The revolution of AI is reaching new heights through new mediums. We’re all enjoying new tools on the edge, but what are they? What products frameworks will fuel the inventions of tomorrow?

If you’re unfamiliar with why Machine Learning is changing our lives, have a read here.

If you’re already excited about Machine Learning and you’re interested in utilizing it on devices like the Raspberry Pi, enjoy!

Simple object detection on the Raspberry Pi

I’ve implemented three different tools for detection on the Pi camera. While it’s a modern miracle that all three work, it’s important for creators to know “how well” because of #perfmatters.

Our three contenders are as follows:

  1. Vanilla Raspberry Pi 3 B+— No optimizations, but just using a TensorFlow framework on the device for simple recognition.
  2. Intel’s Neural Compute Stick 2— Intel’s latest USB interface device for Neural Networks, boasting 8x perf over the first stick! Around $80 USD.
  3. — A proprietary framework that reconfigures your model to run efficiently on smaller hardware. Xnor’s binary logic shrinks 32-bit floats to 1-bit operations, allowing you to optimize deep learning models for simple devices.

Let’s evaluate all three with simple object detection on a camera!

Vanilla Raspberry Pi 3 B+

A Raspberry Pi is like a small, wimpy, Linux machine for $40. It allows you to run high-level applications and code on devices like IoT made easy. Though it sounds like I can basically use laptop machine learning on the device, there’s one big gotcha. The RPi has an ARM processor, and that means we’ll need to recompile our framework, i.e. TensorFlow, to get everything running.

⚠️ While this is not hard, this is SLOW. Expect this to take a very… very… long time. This is pretty much the fate of anything compiled on the Raspberry Pi.


Here are all the steps I did, including setting up the Pi camera for object detection. I'm simply including this for posterity. Feel free to skip reading it.

Install pi, then camera, then edit the /boot/config.txt
Add disable_camera_led=1 to the bottom of the file and rebooting.

Best to disable screensaver mode, as some follow-up commands may take hours

sudo apt-get install xscreensaver

Then disable screen saver in the “Display Mode” tab.

Now get Tensorflow Installed

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get update
sudo apt-get install libatlas-base-dev
sudo apt-get install libjasper-dev libqtgui4 python3-pyqt5
pip3 install tensorflow
sudo apt-get install libjpeg-dev zlib1g-dev libxml2-dev libxslt1-dev
pip3 install pillow jupyter matplotlib cython
pip3 install lxml # this one takes a long time
pip3 install python-tk


sudo apt-get install libtiff5-dev libjasper-dev libpng12-dev
Sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt-get install libxvidcore-dev libx264-dev
sudo apt-get install qt4-dev-tools
pip3 install opencv-python

Install Protobuff

sudo apt-get install autoconf automake libtool curl

Then pull down protobuff and untar it.

Then cd in and then run the following command which might cause the computer to become unusable for the next 2+ hours. Use ctrl + alt + F1, to move to terminal only and release all UI RAM. Close x process with control + c if needed. You can then run the long-running command. Base username “pi” and password “raspberry”

make && make check

You can then install simply with

sudo make install
cd python
export LD_LIBRARY_PATH=../src/.libs
python3 build --cpp_implementation
python3 test --cpp_implementation
sudo python3 install --cpp_implementation
sudo ldconfig

Once this is done, you can clean up some install crud with sudo apt-get autoremove, delete the tar.gz download and then finally reboot with sudo reboot now which will return you to a windowed interface

Setup Tensorflow

mkdir tensorflow1 && cd tesorflow1
git clone --recurse-submodules \
modify ~/.bashrc to contain new env var named PYTHONPATH as such
export PYTHONPATH=$PYTHONPATH:/home/pi/tensorflow1/models/research:/home/pi/tensorflow1/models/research/slim

Now go to the zoo:
We’ll take the ssdlite_mobilenet, which is the fastest! Wget the file and then tar -xzvf the tar.gz result and delete the archive once untarred. Do this in the object_detection folder in your local tensorflow1 folder. Now cd up to the research dir. Then run:

protoc object_detection/protos/*.proto --python_out=.

This converted the object detection protos files to python in the proto folder

Done Installing!!

Special thanks to Edje Electronics for sharing their wisdom on setup, an indispensable resource for my own setup and code.

Once I got Tensorflow running, I was able to run object recognition (with the provided sample code) on Mobilenet for 1 to 3 frames per second.

Vanilla Pi Results

For basic detection, 1 to 3 frames per second aren’t bad. Removing the GUI or lowering camera input quality speeds up detection. This means the tool could be an excellent detector for just simple detection. What a great baseline! Let’s see if we can make it better with the tools available.

Intel’s Neural Compute Stick 2

This concept excites me. For those of us without GPUs readily available, training on the edge instead of the cloud, and moving that intense speed to the Raspberry Pi is just exciting. I missed the original stick, the “Movidius”, but from this graph, it looks like I chose a great time to buy!


My Intel NCS2 arrived quickly and I enjoyed unboxing actual hardware for accelerating my training. That was probably the last moment I was excited.

Firstly, the USB takes a lot of space. You’ll want to get a cable to keep it away from the base.

That’s a little annoying but fine. The really annoying part was trying to get my NCS 2 working.

There are lots of tutorials for the NCS by third parties, and following them got me to a point where I thought the USB stick might be broken!

Everything I found on the NCS didn’t work (telling me the stick wasn’t plugged in!), and everything I found on NCS2 was pretty confusing. For a while, NCS2 didn’t even work on ARM processors!


After a lot of false-trails, I finally found and began compiling C++ examples (sorry Python) that only understood USB cameras (sorry PiCam). Compiling the examples was painful. Often the entire Raspberry Pi would become unusable, and I’d have to reboot.

locked up at 81% for 24 hours

The whole onboarding experience was more painful than recompiling Tensorflow on the raw Pi. Fortunately, I got everything working!

The result!? ??????????????????????

NC2 Stick Results

6 to 8 frames per second… ARE YOU SERIOUS!? After all that?

It must be a mistake, let me run the perfcheck project.

10 frames per second…

From videos on the original NCS on python I saw around 10fps.. where’s the 8x boost? Where’s the reason for $80 hardware attached to a $40 device? To say I was let down by Intel’s NCS2 is an understatement. The user experience and final results were frustrating, to put it lightly. is a self-contained software solution for deploying fast and accurate deep learning models to low-cost devices. As many discrete logic enthusiasts might have noticed, Xnor is the logical complement of the bitwise XOR operator. If that doesn’t mean anything to you, that’s fine. Just know that the people who created the YOLO algorithm are alluding to the use of the logical operator to compress complex 32-bit computations down to 1-bit by utilizing this inexpensive operation and keeping track of the CPU stack.

In theory, avoiding such complex calculations required by GPUs should speed up execution on edge devices. Let’s see if it works!


Setup was insanely easy. I had an object detection demo up and running in 5 minutes. 5 MINUTES!

The trick with is that, much like the NCS2 Stick, the model is modified and optimized for the underlying hardware fabric. Unlike Intel’s haphazard setup, everything is wrapped in friendly Python (or C) code.

model = xnornet.Model.load_built_in()

That’s nice and simple.

But it means nothing if the performance isn’t there. Let’s load their object detection model.

Again, no complexity, they have one with no overlay, and one with. Since the others (except for perfcheck on NCS2) were with overlays, let’s use that. Results

JAW… DROPPING… PERFORMANCE. I not only get a stat on how fast inference could work, but I also get an overall FPS with my overlay that blew everything else out of the water.

OVER 12FPS and an inference speed over 34FPS!?

This amazing throughput is achieved with no extra hardware purchase!? I’d call Xnor the winner at this point, but it seems a little too obvious.

I was able to heat up my device and open a browser in the background to get it down to 8+ FPS, but even then, it’s a clear winner!

Xnor hype is real

The only negative I can give you on is that I have no idea how much it costs. The Evaluation model has a limit of 13,500 inferences per startup.

While emailing them to get pricing, they are just breaking into non-commercial use, so they haven’t created a pricing system yet. Fortunately, the evaluation model would be fine for most hobbyists and prototypes.

In Summary:

If you need to take a variety of models into account, you might be just fine getting your Raspberry Pi setup from scratch. This would make it a great resource for testing new models and really customize your experience.

When you’re ready to ship, it’s no doubt that both the NCS2 and the frameworks speed things up. It’s also no doubt that outperformed the NCS2 in both onboarding and performance. I’m not sure what’s pricing model is, but that would be the final factor in what is clearly a superior framework.