When it comes to reducing cloud costs, optimizing bottlenecks in your code can only take you so far. It may be time to rethink your architecture. Perhaps you’re looking for a new architecture that takes into account the capabilities that modern hardware and software make available. One such architecture is called “thread-per-core”. Research recently demonstrated that a thread-per-core architecture can improve tail latencies of applications by up to 71%. That sounds fantastic, but the machine efficiency gains of thread-per-core can easily be negated by the loss of developer productivity when your application developers have to adjust to a completely new way of doing things and deal with a set of arcane challenges that are specific to this particular model.

Datadog is not immune to those problems. We run a variety of datastores at scale, which ingest metrics at a very high throughput. And we, too, were starting to see how some of our components’ existing architecture was beginning to show limitations. Metrics data, with its very high distribution in space, looks like a prime candidate for a thread-per-core architecture—but we were concerned about keeping the effort manageable.

This article will explore the thread-per-core model with its advantages and challenges, and introduce Glommio (you can also find it on crates.io), our solution to this problem. Glommio allows Rust developers to write thread-per-core applications in an easy and manageable way.

What is thread-per-core?

We know that thread-per-core can deliver significant efficiency gains. But what is it? In simple terms, any moderately complex application has many tasks that it needs to perform: it may need to read data from a database, feed that data through a machine learning model, and then pass that result along the pipeline. Some of those tasks are naturally sequential, but many can be done in parallel. And since modern hardware keeps increasing the number of cores available for applications, it is important to efficiently use them to achieve good performance numbers.

The simplest and most time-tested way of doing that is by employing threads: for each of its internal tasks, the application may use a different thread. If a thread has available work to do, it will do it; otherwise, it will go to sleep and allow the next one to run.

#rust

Introducing Glommio, a Thread per Core Crate for Rust & Linux
3.90 GEEK