HNPWA with Next.js

My Own Private Binary: An Idiosyncratic Introduction to Linux Kernel Modules

spudlyo | 277 points

This is a long essay, and here is my pitch as to why you should read the whole thing if you have any interest in subjects like C programming, binary formats, kernel modules, or assembler.

Breadbox, the author, wants to make smaller binary executables. He explains about ELF binaries, a.out binaries, old MSDOS .COM binaries, and how the later had no metadata, and could be very small. He then explains how you can dynamically load code that deals with new executable binary formats into the Linux kernel, and how this process works. He walks through some sample C for building a "Hello World" kernel module. He then walks you through ~1 page of code for a kernel module that registers a new binary format, sets up some callbacks, and if conditions are right, will vm_mmap() the code into memory and call start_thread() on it.

Yay, it works! He has a tiny binary. This is where most articles would end, but Breadbox goes deeper. What if you want a stack and a heap? What if you want to access argc, argv, and envp? What if you want to append code at the end that automatically calls the exit syscall? All these details are covered, and I think it's glorious.

While this all may seem like pretty dry stuff, there is humor sprinkled throughout, which makes it more fun to read.

spudlyo | 6 months ago

This article is fantastic.

And it pairs well with another article on the front page. [0]

Which I bring up because they disagree on a particular point. And that is how a script without a shebang gets run as a script.

> This is done by registering a set of callback functions, and these callbacks get invoked when the kernel is asked to execute a binary file. The kernel invokes the callbacks on this list, and the first one that claims to recognize the file takes responsibility for getting it properly loaded into memory. If nobody on the list accepts it, then as a last resort the kernel will attempt to treat it as a shell script without a shebang line. And if that doesn't fly, then you'll get that "Exec format error" message described above.

But the article I linked to says the shell actually handles it. And based off of its research (terribly reproduced below), I'm inclined to believe it.

    echo echo Hello world > test.sh
    chmod +x test.sh
    strace ./test.sh
    strace sh -c ./test.sh

You'll see the first one errors with `ENOEXEC`, but the second one does not. Also, in my head, I don't know how the kernel would know what shell to choose, or that it should even expect to have access to a shell.

[0]: https://news.ycombinator.com/item?id=43646698

jmholla | 6 months ago

You can do better than 2 bytes. Use the same epilogue, but store a copy of the "binary" just before the stack pointer and offset the instruction pointer from the start of the binary by 1 byte. If you use the binary consisting of literally a one-byte value, 0x2A (i.e. 42), then your first instruction will be the first instruction of the epilogue which will pop the "binary" into RDI setting RDI to 42. There are maybe some details in the alignment, padding, and instruction choice in the loader to make that work "generically", but that strategy should work and give you a 1-byte solution.

edit: Actually, just define your binary format so that the first byte is copied to the stack and all subsequent bytes are copied to text with the epilogue appended to it.

edit: You could also define it so that the first byte is copied into the first argument register/RDI if you want to shrink loaded RAM usage to just 4 bytes of code and 1 byte of data.

This is of course assuming it is a "generic" binary format that is not literally just encoding the contents of the tiny program. Otherwise you could do 0 bytes and just have the loader pre-fill RAX with 60 and RDI with 42 and insert a one instruction epilogue consisting of syscall. You could technically still call that a "generic" binary format since any actual binary you attempt to load will just blow away those pre-filled GPR values.

Veserv | 6 months ago

COM files on Windows are always 16-bit. His CON files appear to be the native bit width of the kernel. This means unlike on Windiwsm his COM files cannot execute on both 32-bit and 64-bit versions of the kernel. That one imperfection aside, this is a fantastic achievement.

ryao | 6 months ago

The appendix to this is also good, and goes over things like getting linker scripts to create binaries using objdump and writing C wrappers for syscalls: https://www.muppetlabs.com/~breadbox/txt/mopb-app.html

HeliumHydride | 6 months ago

This is a very good read and excellent in that we hope everyone knows about these things -- how computers actually work and how efficient and simple things can be -- but some readons probably don't, and this wonderfully accessible write-up is a good way to learn. And for those who know most of these details it is wonderfully refreshing.

stmw | 6 months ago

This is amazing and I wish I had access to this resource months ago when I explored a new binary format as well.

setheron | 6 months ago

> Traditionally, programs will place their code into non-writeable memory, and store variable data in memory that is writeable but not executable. And that's definitely the safer way to do things, but we can't be bothered with all that.

Woah, I have a feeling this does something even more. If the program modifies its own instructions, the kernel will probably save those modifications in the file too.

amstan | 6 months ago

Also interesting - how to make a single, small executable that can run natively on Windows, Linux, Mac, etc:

https://news.ycombinator.com/item?id=32648359

https://github.com/jart/cosmopolitan

https://en.m.wikipedia.org/wiki/Fat_binary

rkagerer | 6 months ago

| For example, one time while working on my kernel module, I accidentally put --i instead of ++i in the iterator of my for loop. I inserted that module into my kernel to test it, and my mouse cursor disappeared, and my music stopped playing … and then it was time to reboot my computer

Id recommend using QEmu for the type of work the author is doing. It makes iteration much faster.

bhawks | 6 months ago

The first kernel module I developed was based on a blog post[0] from Oracle of all people.

0: https://blogs.oracle.com/linux/post/introduction-to-netfilte...

nazgulsenpai | 6 months ago

Very nice read, thanks for sharing! I will immediately give the link to my systems & networks students. Just a few weeks ago I taught them how to write basic kernel modules. This is a very cool addendum to that class :).

p4bl0 | 6 months ago

As an amateur Linux user I've long thought of these .ko files and many other binaries as "magic", but no more! This article presents the concepts very naturally so it was easy to absorb.

Liftyee | 6 months ago

I just would name the kernel modules properly. comexec and calmexec. Or crownexec

rurban | 6 months ago