Hyperfine: A command-line benchmarking tool

hundredwatt | 235 points

Perhaps interesting (for some) to note that hyperfine is from the same author as at least a few other "ne{w,xt} generation" command line tools (that could maybe be seen as part of "rewrite it in Rust", but I don't want to paint the author with a brush they disagree with!!): fd (find alternative; https://github.com/sharkdp/fd), bat ("supercharged version of the cat command"; https://github.com/sharkdp/bat), and hexyl (hex viewer; https://github.com/sharkdp/hexyl). (And certainly others I've missed!)

Pointing this out because I myself appreciate comments that do this.

For myself, `fd` is the one most incorporated into my own "toolbox" -- used it this morning prior to seeing this thread on hyperfine! So, thanks for all that, sharkdp if you're reading!

Ok, end OT-ness.

ratrocket | 3 days ago

Hyperfine is a great tool but when I was using it at Deno to benchmark startup time there was a lot of weirdness around the operating system apparently caching inodes of executables.

If you are looking at shaving sub 20ms numbers, be aware you may need to pull tricks on macos especially to get real numbers.

mmastrac | 3 days ago

I've also had a good experience using the 'perf'[^1] tools for when I don't want to install 'hyperfine'. Shameless plug for a small blog post about it as I don't think it is that well known: https://usrme.xyz/tils/perf-is-more-robust-for-repeated-timi....

---

[^1]: https://www.mankier.com/1/perf

usrme | 3 days ago

Hyperfine is great. I use it sometimes for some quick web page benchmarks:

https://abuisman.com/posts/developer-tools/quick-page-benchm...

As mentioned here in the thread, when you want to go into the single ms optimisations it is not the best approach since there is a lot of overhead especially the way I demonstrate here, but it works very well for some sanity checks.

mosselman | 3 days ago

The comment about statistics that I wanted to reply to has disappeared. That commenter said:

> I stand firm in my belief that unless you can prove how CLT applies to your input distributions, you should not assume normality. And if you don't know what you are doing, stop reporting means.

I agree. My research group stopped using Hyperfine because it ranks benchmarked commands by mean, and provides standard deviation as a substitute for a confidence measure. These are not appropriate for heavy-tailed, skewed, and otherwise non-normal distributions.

It's easy to demonstrate that most empirical runtime distributions are not normal. I wrote BestGuess [0] because we needed a better benchmarking tool. Its analysis provides measures of skew, kurtosis, and Anderson-Darling distance from normal, so that you can see how normal or not is your distribution. It ranks benchmark results using non-parametric methods. And, unlike many tools, it saves all of the raw data, making it easy to re-analyze later.

My team also discovered that Hyperfine's measurements are a bit off. It reports longer run times than other tools, including BestGuess. I believe this is due to the approach, which is to call getrusage(), then fork/exec the program to be measured, then call getrusage() again. The difference in user and system times is reported as the time used by the benchmarked command, but unfortunately this time also includes cycles spent in the Rust code for managing processes (after the fork but before the exec).

BestGuess avoids external libraries (we can see all the relevant code), does almost nothing after the fork, and uses wait4() to get measurements. The one call to wait4() gives us what the OS measured by its own accounting for the benchmarked command.

While BestGuess is still a work in progress (not yet at version 1.0), my team has started using it regularly. I plan to continue its development, and I'll write it up soon at [1].

[0] https://gitlab.com/JamieTheRiveter/bestguess [1] https://jamiejennings.com

jamietheriveter | 2 days ago

A capable alternative based on "boring, old" technology is multitime [1]

Back at the time I needed it, it had peak memory usage - hyperfine was not able to show it. Maybe this had changed by now.

[1] https://tratt.net/laurie/src/multitime/

smartmic | 3 days ago

"Hyperfine seems like an incredibly useful tool for anyone working with command-line utilities. The ability to benchmark processes straightforwardly is vital for optimizing performance. I’m particularly impressed with how simple it is to use compared to other benchmarking tools. I’d love to see more examples of how Hyperfine can be integrated into different workflows, especially for large-scale applications.

https://www.osplabs.com/

shawndavidson7 | 3 days ago

Hyperfine is great! I remember I learned about it when comparing functions with/without tail recursion (not sure if it was from the Go reference or the Rust reference). It provides simple configurations for unit test. But I have not tried it on DBMS (e.g. like sysbench). Does anyone have a try?

edwardzcn | 3 days ago

What database product does the community commonly send benchmark results to? This tool is great, but I'd love to analyze results relationally.

7e | 3 days ago

Hyperfine is a really useful tool.

Weirdest thing I've used it for is comparing io throughput on various disks.

accelbred | 3 days ago

Hyperfine is hyper frustrating because it only works with really really fine microsecond level benchmarks. Once you get into the millisecond range it’s worthless.

forrestthewoods | 3 days ago