This post covers:

  • Benchmark reports for contributors
  • Benchmark reports for users
  • Profiling with valgrind / kcachegrind
  • Reproducible benchmarks and graphics
  • Tips for benchmark behavior and benchmarking other languages

Lots of libraries advertise how performant they are with phrases like “blazingly fast”, “lightning fast”, “10x faster than y” – oftentimes written in the project’s main description. If performance is a library’s main selling point then I expect for there to be instructions for reproducible benchmarks and lucid visualizations. Nothing less. Else it’s an all talk and no action situation, especially because great benchmark frameworks exist in nearly all languages.

I find performance touting libraries without a benchmark foundation analogous to GUI libraries without screenshots.

This post mainly focuses on creating satisfactory benchmarks in Rust, but the main points here can be extrapolated.

Use Criterion

If there is one thing to takeaway from this post: benchmark with Criterion.

Never written a Rust benchmark? Use Criterion.

Only written benchmarks against Rust’s built in bench harness? Switch to Criterion:

  • Benchmark on stable Rust (I personally have eschewed nightly Rust for the last few months!)
  • Reports statistically significant changes between runs (to test branches or varying implementations).
  • Criterion is actively developed

Get started with Criterion

When running benchmarks, the commandline output will look something like:

sixtyfour_bits/bitter_byte_checked
                        time:   [1.1052 us 1.1075 us 1.1107 us]
                        thrpt:  [6.7083 GiB/s 6.7274 GiB/s 6.7416 GiB/s]
                 change:
                        time:   [-1.0757% -0.0366% +0.8695%] (p = 0.94 > 0.05)
                        thrpt:  [-0.8621% +0.0367% +1.0874%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

This output is good for contributors in pull requests or issues, but I better not see this in a project’s readme! Criterion generates reports automatically that are 100x better than console output.

Criterion Reports

Below is a criterion generated plot from one of my projects: bitter. I’m only including one of the nearly 1800 graphics generated by criterion, the one chosen captures the heart of a single benchmark measuring Rust bit parsing libraries across read sizes (in bits).

This chart shows the mean measured time for each function as the input (or the size of the input) increases.

Out of all the auto-generated graphics, I would consider this the only visualization that could be displayed for a more general audience, but I still wouldn’t use it this way. This chart lacks context, and it’s not clear what graphic is trying to convey. I’d even be worried about one drawing inappropriate conclusions (pop quiz time: there is a superior library for all parameters, which one is it?).

It’s my opinion that the graphics that criterion generates are perfect for contributors of the project as there is no dearth of info. Criterion generates graphics that break down mean, median, standard deviation, MAD, etc, which are invaluable when trying to pinpoint areas of improvement.

As a comparison, here is the graphic I created using the same data:

It may be hard to believe that the same data, but here are the improvements:

  • A more self-explanatory title
  • Stylistically differentiate “us vs them”. In the above graphic, bitter methods are solid lines while “them” are dashed
  • More accessible x, y axis values
  • Eyes are drawn to the upper right, as the throughput value stands out which is desirable as it shows bitter in a good light. It’s more clear which libraries perform better.

These add context that Criterion shouldn’t be expected to know. I recommend spending the time to dress reports up before presenting it to a wider audience.

#rust #developer

Guidelines on Benchmarking in Rust
2.25 GEEK