Why is Apple Rosetta 2 fast? (2022)

fanf2 | 170 points

Post got the big one: Total Store Ordering (TSO).

The rest are all techniques in reasonably common use, but unless you have hardware support for x86's strong memory ordering, you cannot get very good x86-on-ARM performance, because it's by no means clear when strong memory ordering matters, and when it doesn't, inspecting existing code - so you have to liberally sprinkle memory barriers around, which really kill performance.

The huge and fast L1I/L1D cache doesn't hurt things either... emulation tends cache-intensive.

Syonyk | 2 days ago

Super interesting. Putting my PM hat on, I wonder: how many x86 apps on Apple still benefit from this much performance? What's the coverage? The switch to M1 happened 4 years ago, so the software was designed for hardware nearly half a decade old.

Excellent engineering and nice that it was built properly. Is this something that Linux / Wine / the Steam compatibility layer already benefit from?

leshokunin | 2 days ago

Standardization of future Arm PCs, https://news.ycombinator.com/item?id=42182442

  The Arm PC Base System Architecture 1.0 (PC-BSA) specifies a standard hardware system architecture for Personal Computers (PCs) that are based on the Arm 64-bit Architecture. PC system software, for example operating systems, hypervisors, and firmware can rely on this standard system architecture. PC-BSA extends the requirements specified in the Arm BSA.
transpute | 2 days ago

Tangent: also why orbstack, a Docker replacement on Mac, is fast [1] (I'm not affiliated in any way, just a fan and happy user :-).

--

1: https://docs.orbstack.dev/features

emmanueloga_ | 2 days ago

I wonder if these lessons might be applied to Wasm runtimes where the Wasm could be JIT compiled into native code. Of course this does raise the possibility of security concerns if the Wasm compilation has some bug, and then of course there’s also the question of whether Wasm’s requirements might mean native compilation doesn’t give much of a performance boost (as seems to be the case with e.g., Java byte code).

dhosek | 2 days ago

One other thing that is not mentioned is that Apple has an extension to compute rarely used x86 flags such as the parity flag in hardware rather than in software.

kccqzy | 2 days ago

(2022)

brycewray | 2 days ago

Good article.

NL807 | 2 days ago

[dead]

lericepmind2023 | 2 days ago
[deleted]
| 2 days ago