Waiting for many things at once with io_uring

ashvardanian | 118 points

io_uring and Linux's many different types of file descriptors are great. I mean, I personally think that the explicit large API surface of WinNT is kinda nicer than jamming a bunch of weird functionality into files and file descriptors like Linux, but when things work, they do show some nice advantages of unifying everything to some framework, ill-fitting as it may sometimes be (Though now that I say this, it's not like WinNT Objects are really any different here, they just offer more advanced baseline functionality like ACLs). io_uring and it's ability to tie together a lot of pre-existing things in new ways is pretty cool. UNIX never really had a story for async operations, something I will not fault an OS designed 50 years ago for. However, still not having a decent story for async operations today is harder to excuse. I've been excited to learn about io_uring. I've learned a lot listening to conference talk recordings about it. While it has its issues (like the many times it (semi-?)accidentally bypassed security subsystems...) it has some really cool and substantial benefits.

I'll tell you what I would love to see next: a successor to inotify that does not involve opening one zillion file descriptors to watch a recursive subtree. I'm sure there are valid reasons why it's not easy to just make it happen, but it feels like it will be a major improvement in a lot of use cases. And in many cases, it would probably fix the dreaded problem of users needing to fight against ulimits, especially in text editors like VSCode.

I don't have anything of great substance to say about the actual subject of the article. It feels a bit late to finally get this functionality proper in Linux after NT had it basically forever, but any improvement is welcome. Next time I'm doing something where I want to wait on a bunch of FDs I will have to try this approach.

jchw | 3 days ago

Wikipedia:

> In June 2023, Google's security team reported that 60% of the exploits submitted to their bug bounty program in 2022 were exploits of the Linux kernel's io_uring vulnerabilities. As a result, io_uring was disabled for apps in Android, and disabled entirely in ChromeOS as well as Google servers.[11] Docker also consequently disabled io_uring from their default seccomp profile.[12]

Root privilege CVE from earlier this year (2024): https://nvd.nist.gov/vuln/detail/CVE-2024-0582

KerrAvon | 3 days ago

It took me many io_uring hello world articles to find out it's not really used in production (ex. Android and ChromeOS both disable it) because it was, and continues to be, a source of an absolutely bonkers outsized # of security issues.

I don't remember much more than that*, but just dropping it here because I learned a ton more from reading about that, than my Nth io_uring article.

* for example, the article mentioning relevant buffers are shared with the system made me want to say "aHA, yes, that's what the security articles said was a core issue!" -- but I can't actually remember with 100% confidence

refulgentis | 3 days ago

Discussion thread in the Erlang community proposing implementing io_uring for BEAM, security issues, and a digression comparing it to FreeBSD's kqueues

https://erlangforums.com/t/erlang-io-uring-support/765/18?pa...

hosh | 3 days ago

Some of the things that you cannot wait on using io_uring are your kernel actually supporting the feature mentioned in the article, io_uring actually working properly, and io_uring solving its seemingly bottomless supply of local user exploits. In the early days of this feature I was bullish but the way its implementation has emitted CVEs has not been a source of joy, and now many major Linux operators have banned the API internally. Maybe what is needed is a moment of reflection and a scratch reimplementation that learns the lessons of io_uring?

jeffbee | 3 days ago

Surprisingly, I only came across Francesco's blog this month. I stumbled upon the 2021 post "Speeding up atan2f by 50x" while searching for others who have to reimplement trigonometry in SIMD every other year. I've also enjoyed "Beating the L1 cache with value speculation" from the same year, as well as the 2013 Agda sorting example.

Highly recommend checking it out: https://mazzo.li/archive.html

ashvardanian | 3 days ago

When I needed something similar to that for older Linux kernels, I have used primitives based on file descriptors (eventfd for manually reset events, pidfd_open to wait for completion of processes, mq_open for sending messages), and poll() to wait for multiple things with one system call.

Const-me | 2 days ago

Very interesting, but unfortunate there is no example program. I guess that is left as exercise for reader, but it's a bit daunting for a non systems programmer.

4hg4ufxhy | 3 days ago

It's a shame io_uring is proving to be such a disappointment. It's been over two decades now that Linux has been trying to catch up with the NT Kernel's IO Completion Ports and we're still not there.

On the plus side, this submission somehow reminded me about ACE[1], which is where I first came across the Proactor[2]/Reactor distinction. Good times!

[1] https://www.dre.vanderbilt.edu/~schmidt/ACE.html

[2] https://www.dre.vanderbilt.edu/~schmidt/PDF/Proactor.pdf

User23 | 3 days ago

[dead]

c0detrafficker | 3 days ago