Machine Code Isn't Scary

surprisetalk | 204 points

Reading this thread leaves me with the impression that most posters advocating learning assembly language have never had to use it in a production environment. It sucks!

For the overwhelming majority of programmers, assembly offers absolutely no benefit. I learned (MC6809) assembly after learning BASIC. I went on to become an embedded systems programmer in an era where compilers were still pretty expensive, and I worked for a cheapskate. I wrote an untold amount of assembly for various microcontrollers over the first 10 years of my career. I honestly can't say I got any more benefit out of than programming in C; it just made everything take so much longer.

I once, for a side gig, had to write a 16-bit long-division routine on a processor with only one 8-bit accumulator. That was the point at which I declared that I'd never write another assembly program. Luckily, by then gcc supported some smaller processors so I could switch to using Atmel AVR series.

HeyLaughingBoy | 2 days ago

I have tried to convince people that ASM is reasonable as a first stage teaching language. The reputation as a nearly mystical art practiced by a few doesn't help. The thing is, instructions are simple. Getting them to do things is not hard, the difficulty comes from tasks exceeding a scale where you can think about things at their most basic level.

It quickly becomes tedious to do large programs, not really hard, just unmanagable, which is precisely it should be taught as a first language. You learn how do do simple things and you learn why programming languages are used. You teach the problem that is being solved before teaching more advanced programming concepts that solve the problem.

Lerc | 2 days ago

Here's a similar (and much more indepth) opcode decoding recipe for Z80, very useful for emulator development:

http://www.z80.info/decoding.htm

For actually programming in machine code this understanding of the internal opcode structure isn't all that useful though, usually - without an assembler at hand - you had a lookup table with all possible assembly instructions on the left side, and the corresponding machine code bytes on the right side.

Programming by typing machine code into a hex editor is possible, but really only recommended as absolute fallback if there's no assembler at hand - mainly because you had to keep track of all global constant and subroutine entry addresses - e.g. the main thing that an assembler does for you, and you had to leave gaps at strategic locations so that it is possible to patch the code without having to move things around.

flohofwoe | 2 days ago

For the past year or so, a couple teen boys from my neighborhood come by on sunday afternoon for a couple hours of programming in python. I started very simply and built up with text based tasks, then showed them pygame.

I am thinking about showing them what is under the hood, that python itself is just a program. When I learned to program it was the late 70s, and trs-80s and apple-IIs were easy to understand at the machine code level.

I could recapitulate that experience for them, via an emulator, but that again just feels like an abstraction. I want them to have the bare-metal experience. But x86 is such a sprawling, complicated instruction set that it is very intimidating. Of course I'd stick to a simplified subset of the instructions, but even then, it seems like a lot more work to make output running on a PC vs on the old 8-bit machines where you write to a specific location and it shows up on the screen.

tasty_freeze | 2 days ago

ASM programming is fun. Machine code (as in what ASM encodes to) isn't scary, but it is extremely tedious to work with. I recommend the first part of Casey Muratori's Performance Aware Programming course if you want to feel that pain.

jebarker | 2 days ago

Machine code isn't scary, but its nature is severely misunderstood.

Skipping over the bundling of instructions into code blocks, the next logical construct are functions. These have references to code and data in memory; if you want to relocate functions around in memory you introduce the concept of relocations to annotate these references and of a linker to fix them to a particular location.

But once the linker has done its job, the function is no longer relocatable, you can't move it around... or that is what someone sane might say.

If you can undo the work of the linker, you can extract relocatable functions from executables. These functions can then be reused into new executables, without decompiling them first; after all, if what you've extracted is equivalent to the original relocatable function, you can do the same things than it.

Repeat this process over the entire executable and you're stripped it for parts, ready to be put back together with the linker. Change some parts and you have the ability to modify it as if you're replacing object files, instead of binary patching it in place with all the constraints that comes with it.

Machine code is like Lego bricks, it just takes a rather unconventional point of view (and quite a bit of time to perfect the art of delinking) to realize it.

boricj | 2 days ago

I started building a Forth recently, but decided that instead of interpreter or transpiler or whatever, I'd map to bytes in memory and just straight execute them.

This non-optimising JIT has been far, far easier than all the scary articles and comments I've seen led me to believe.

I'm already in the middle of making it work on both Aarch64 and RISC-V, a couple weeks in.

shakna | 2 days ago

I taught myself to program on an 8-bit BBC micro-computer in the mid-80s by typing in BASIC listings. I understood BASIC quite well, and could write my own structured BASIC programs, but machine code was always a bit out-of-reach. I would try to read books that started by demonstrating how to add, subtract etc, but I couldn’t see how that could build up to more complicated stuff that I could do in BASIC, like polling for input, or playing sounds, or drawing characters on the screen. Only once I got an advanced users guide and discovered the operating system commands, then it started to click with me - the complicated stuff was just arranging all the right data in the right bits of memory or registers, then (essentially) calling a particular OS command and saying ‘here’s the data you want’.

PlunderBunny | 2 days ago

In 1982, I programmed my ZX81 by converting assembly to hex by hand because BASIC was just too slow. I'd write my assembly on paper, convert it to hex using reference tables, then use a simple BASIC FOR loop to POKE the values into memory we'd reserved space for the machine code in a REM statement at a fixed position in memory.

When all the values were POKEd in, I'd save to tape and execute it with RAND USR 16514.

That memory address is permanently etched in my brain even now.

It wasn't good, bad or scary it was just what I had to do to make the programs I wanted to make.

neomech | 2 days ago

For me the 'scary' part of machine code was never the actual logic. It was always just staring at that wall of hex or mnemonics and feeling like I needed a secret decoder ring!

dedicate | 2 days ago

Machine code was only "scary" in the old days when you had to reboot your system when you made a small mistake.

amelius | 2 days ago

I always thought machine code was something only experts could understand. But after reading this article, I realized the basic concepts aren’t that complicated, it’s really just instructions, registers, and memory. I feel like this has given me a clearer understanding when it comes to writing code.

Leo-thorne | a day ago

The thing is: most programmers see assembly language generated by a compiler, so no comment, and in optimised code with vector operations, it IS scary.

renox | 16 hours ago

This is the video I wished I had seen when I was a kid, feeling like assembly was a dark art that I was too dumb to be able to do. Later in life I did a ton of assembly professionally on embedded systems. But as a kid I thought I wasn’t smart enough. This idea is poison, thinking you’re not smart enough, and it ruins lives.

https://youtu.be/ep7gcyrbutA?si=8HiMqH2mMwsJRNDg

unoti | 2 days ago

Oh, cool. A couple years ago I spent a few days disassembling a small x86-64 binary by hand. Getting familiar with the encoding was a lot of fun! The following reference was indispensable:

http://ref.x86asm.net/

xelxebar | a day ago

Indeed. In Knuth’s Art the machine code was not the “scariest” part. (Programming is hard.)

Koshkin | 17 hours ago

Is it enough to play Human Resource Machine!? https://en.wikipedia.org/wiki/Human_Resource_Machine

Assembly as a game, I loved playing it.

kgilpin | 2 days ago

Ok, what about VLIW ASM? Have you ever seen Elbrus' ASM with predicated code, asynchronous Array Prefetch Buffer, rotating registers, DAM (hardware table to memory dependencies disambiguation), registers windows etc. It's really hard to start read this.

redf1sh | 2 days ago

When I was last working with machine code, I found capstone to be very useful. Even just reading the source was helpful for some of the conditionally present amd64 fields.

https://github.com/capstone-engine/capstone

davemp | 2 days ago

> But what if we want to represent a number larger than 12 bits? Well, the add instruction doesn't let us represent all such numbers; but setting sh to 1 lets us shift our number by 12 bits. So for example we can represent 172032172032 by leaving our 42 alone and setting sh to 1. This is a clever technique for encoding larger numbers in a small space.

This is whacky. So presumably adding with a 24-bit constant whose value isn't representable in this compressed 13-bit format would then expand to two add instructions? Or would it store a constant to a register and then use that? Or is there a different single instruction that would get used?

(Also, typo, 42 << 12 is 172032).

kibwen | 2 days ago

Well, machine code is scary. But like with many scary things, once you overcome your fears and get familiar with the thing, you realize that it is not that bad.

GuB-42 | 2 days ago

This is a tangent but yesterday I was just pondering the abstraction layer from machine code to assembly lexers to compilers to interpreted languages. Having been lucky enough to be born at a time to witness these shifts, it's so easy to forget what we used to think was normal.

The thought came to me when testing the new Jules agentic coding platform. It occured to me that we are just in another abstraction cycle, but like those before, it's so hard to see the shift when everything is shifting.

My conclusion? There will be non-AI coders, but they will be as rare as C programmers before we realize it.

elif | 2 days ago

Thank you Jimmy, great article.

My 23+ year experience in computer science and programming is a zebra of black-or-white moments. For the most time, things are mostly obscure, complicated, dark and daunting. Until suddenly you stumble upon a person who can explain those in simple terms, focus on important bits. You then can put this new knowledge into a well-organized hierarchy in your head and suddenly become wiser and empowered.

"Writing documentation", "talking at conferences", "chatting at a cooler", "writing to a blog" and all the other discussions from twitter to mailing lists - are all about trying to get some ideas and understanding from one head into another, so more people can get elucidated and build further.

And oh my how hard is that. We are lucky to sometimes have enlightenment through great RTFMs.

oleganza | 2 days ago

I think this article is missing a major point, or perhaps should be titled "Some Non-Scary Machine Code Isn't Scary". It argues that machine code isn't scary, by building a one-to-one mapping from machine code to assembly code, and then taking it as given that assembly code isn't scary. But it uses two examples -- 32-bit ARM and x86-64 -- where this one-to-one mapping isn't valid. When in Thumb mode for (some flavors of) ARM, even when you know you're in thumb mode, instructions can be a mix of 16 and 32 bits. And in x86 world, of course, instructions can be a wide range of widths. What that means is that if you're given a chunk of memory that is known to contain executable instructions... you /can't/ build a one-to-one mapping to assembly without knowing where all of the entry points are. For well-formed code you can often exclude almost all possible entry points as invalid, and maybe even end up with only a single one... but it's perfectly possible (and quite fun) to write machine code that has valid, different behavior for different entry points to the same byte sequence. There's no way to reduce this type of machine code to meaningful assembly, and it should be considered scary.

addaon | 2 days ago

That is one thing that was nice about DOS, you where close to the machine. I never fully got machine language, but it was fun trying.

IIRC, debug.com could be used to create programs using machine lang.

jmclnx | 2 days ago

Machine code ceased to be scary, or at least mysterious, to me when I opened the rudimentary debugger on a TRS-80. It was really more of a monitor, and it showed the contents of a certain chunk of memory in the top half of the screen. I loaded a program I was working on into it, and began changing instructions in memory, using the assembler output as my guide, and jumping into the program to see what the effects were. After that the little lightbulb went off. Oh, these are just bytes in memory that correspond to CPU instructions, and the CPU just reads them off and executes the instructions.

bitwize | a day ago

I think machine code and building something like a Forth is way easier to understand than any contemporary programming language toolchain.

thuanao | a day ago

[dead]

pillars | a day ago

[dead]

pillars | a day ago

[dead]

1Sebastian | a day ago