Concurrency Patterns in Embedded Rust

How should you do concurrency in an embedded no_std application? There’s no built-in support for time-sliced threads in core; that abstraction is only available in std (see std::thread). The latest stable release brought the async/await feature to no_std thanks to our compiler work. Should you always use that instead? Is cooperative scheduling appropriate for all kind of embedded applications? What about applications with soft or hard real-time requirements?

If you have asked yourself these questions before then this post is for you! We’ll look into a few commonly used embedded concurrency patterns and then into some less common ones. We’ll compare their runtime characteristics in practical terms so you get a better feel for which one is appropriate for your application.

Time-sliced threads

This is by far the most widely known and used concurrency pattern. If you have written a concurrent hosted application, an application that runs on top of an Operating System (OS), then you have directly or indirectly use time-sliced threads as they are a concurrency / multitasking construct that virtually all OSes provide.

In embedded applications threads are commonly used for multitasking with one task mapped to one thread. Embedded devices usually have only one CPU so adding threads does not result in parallelism (parallel execution of threads). Example usage below:

// thread / task 1 uses the radio interface
let (mut rtx, mut rrx) = radio::claim();
thread::spawn(move || loop {
    let mut buffer = [0; 64];
    let n = rrx.recv(&mut buffer);
    // omitted: prepare a response
    rtx.send(buffer[..m]);
});

// thread / task 2 uses the serial interface
let (mut stx, mut srx) = serial::claim();
thread::spawn(move || loop {
    let mut buffer = [0; 64];
    let n = srx.read(&mut buffer);
    // omitted: prepare a response
    stx.write(buffer[..m]);
});

In this example two independent tasks are executed concurrently by running each one in a separate thread. In the example the threads do not need to communicate but inter-thread communication is possible via atomics, mutexes or channels.

Apart from familiarity the other big advantage of threads is that you can use existing blocking APIs in threads without modification and still end up with a concurrent / multitasking application.

Currently, most of the existing Hardware Abstraction Layers (HAL) in the embedded crate ecosystem have only blocking APIs. One would then think that threads are the preferred way to do concurrency as those blocking HALs can be used “as they are” in threaded applications. However, that’s not the case; there aren’t pure-Rust thread runtimes and although there are Rust bindings to embedded OSes written in C that do provide a thread abstraction they are not that widely used.

This lack of use of threads in embedded Rust may have to do with their main disadvantage: inefficient stack usage due to each thread having its own separate call stack. A chunk of RAM must be allocated to each thread as call stack space; the chunk must be large enough so that the call stack does not grow beyond its reserved space and ends up colliding with the call stack of a different thread.

Memory layout of a multi-threaded program

Memory layout of a multi-threaded program

The above image depicts the memory layout of multi-threaded program that uses the whole RAM available on an embedded device. At the bottom (smallest memory address) you’ll see the .bss+.data section; this section contains statically allocated variables. Right above that section you’ll see the heap. The memory allocator uses the heap space to dynamically allocate data like vectors (growable arrays) and (growable) strings. Dynamic memory allocation is optional in embedded applications so the heap may not always be present; when it is present, it is usually assigned a fixed amount of memory. The rest of the memory is used for the call stacks of the threads. In this example the call stacks grow downwards, towards smaller addresses. As you can imagine the stack of T1 could grow large enough to collide with the stack of T2; likewise the stack of T3 could grow too large and collide with the heap. Some architectures provide runtime mechanisms to detect these collisions but when collisions are not runtime checked corruption of memory can occur.

Embedded devices have limited amounts of RAM, usually in the order of tens of KBs. This greatly limits the amount of threads can be spawned if allocated stack space is fixed to a large number like 4 KB. If alternatively, all the RAM is equally split among N threads this can lead to each thread having too little stack space which increases the risk of stack overflows.

In summary:

Pros of threads:

Popular concurrency model with a familiar API (std::thread)
Existing blocking code can be made run concurrently without modification

Cons of threads:

Inefficient stack usage: one separate call stack per thread. Having more threads results in greater risk of a stack overflow (collision between call stacks)

#rust #developer

Time-sliced threads

ferrous-systems.com

Concurrency Patterns in Embedded Rust