Show HN: Nova JavaScript Engine
This is a great idea! I had thought about doing this with a lisp interpreter. I had identified a few key advantages:
- homogenous allocation means no alignment gaps - linear access win in garbage collection - indices smaller than pointers - type discriminated index can save some size
I haven’t verified whether those actually work out in the details. I’ll read your blog article.
Don’t bother with these comments immediately comparing it to V8 (a multi billion dollar venture). I don’t know how many creative projects they’ve done before.
You may be be interested in looking at Fabrice Bellard’s JS engine for ideas.
A short comparison to how V8 does these things:
1. All data in V8 is allocated into one of many heap parts: Usually new data goes into a nursery space, and if it does not get GC'd it moves to the old space. Relative position of data isn't really guaranteed at this point.
2. All heap references in V8 are true pointers or, if pointer compression is used, offsets from the heap base.
3. All objects in V8 include all the data needed for them to act as objects, and all of their data is stored in a single allocation (with the exception of properties, with some exceptions). The more specialised an object is, say an ArrayBuffer, Uint8Array, or a DataView, the bigger it has to be as the specialisation requires more data to be stored.
Isn't data oriented design driven by knowing what your data accesses look like? In your engine, you're building as if you're assuming that common data access will be linear access over objects of the same type. Why?
I recommend reading /Don’t Stop the BIBOP: Flexible and Efficient Storage Management for Dynamically Typed Languages/ (1994)[1]. "BIBOP" stands for "Big Bag of Pages."
Is this an experimental only JS engine or do you aim to implement the entire ECMAscript specification?
I have been following the Rust Boa project, but I think that it isn't production ready, yet. https://github.com/boa-dev/boa
"Numbers go into the numbers vector" is unusual - typically JS engines use either NaN-boxing or inline small integers (e.g. v8 SMI). I suppose this means that a simple `this.count += 1` will always allocate.
Have you considered using NaN-boxing? Also, are the type-specific vectors compacted by the GC, or do they maintain a free list?
Do you have some specific application profile in mind?
Sounds like this approach could be useful for games that embed a scripting engine. In that context it might be interesting to eventually see some benchmarks against usual suspects of game scripting like Lua.
What exactly is meant by the word "kind" here when you say kind-specific vectors?
If I have `function X(a) { this.a = a; }` and then `function Y(b) { this.b = b; }` does that mean `new X(1)` and `new Y(2)` are considered objects of different kinds?
And what about creating objects with literals: are `{a: 1}` and `{b: 2}` considered objects of different kinds?
I’m wondering how this interacts with the “young objects mostly die” assumption of a generational garbage collector. It seems like using an arena for the young generation might work better for some programs, while an ECS-like scheme works better for other programs.
Does Nova include a JIT compiler? Or just an interpreter?
doesn't using these sorts of data structures turn the security safety properties of rust into silent logic errors?
Wild, interested to see how Nova would perform on a benchmark suite.
Well, Devs of V8, Spidermonkey, Webkit and GraalJS are all on HN. Hopefully they see this and all chime in.
Do you plan on supporting TCO? I was disappointed to learn a few years ago that V8 wouldn't, on the grounds that, IIRC, it would confuse developers.
True tail call recursion and lazy evaluation would enable truly functional JS.
I'm coming at this with no real JavaScript implementation knowledge, so these comments might not even make sense...
The data sorting seems quite cleanly at first, but as I think more about it I don't quite get it. I guess you are saving a bit of space by segmenting by type... in another approach you might have the type on the pointer, and the pointer can point to anything, and so it's potentially a bit longer than having a type and pointer(/index) that points into a smaller portion of memory specific to that type. But enough to matter?
"No, pointers we do not want and cannot have, so the only real option is to use indexes. Indexes have a lot of benefits: They are small, work exceedingly well together with our heap vectors, enable using the same value to index into multiple heap vectors (or slices of the same heap vector), perform a form of pointer compression automatically, and offer great protection from safety vulnerabilities as reinterpreting an index as a different type changes both the type and the memory it indexes into."
That all just sounds like a pointer to me? The last case also seems like a security hole, not protection.
"Not all objects are the same: They differ in their usage and their capabilities. An object-oriented reading of JavaScript objects' capabilities and the ECMAScript specification would give you a clear and simple inheritance graph where the ordinary object is the base object class, and Arrays, DataViews, Maps, and others inherit from that. Not all objects are the same: They differ in their usage and their capabilities. An object-oriented reading of JavaScript objects' capabilities and the ECMAScript specification would give you a clear and simple inheritance graph where the ordinary object is the base object class, and Arrays, DataViews, Maps, and others inherit from that."
It seems like you are special-casing a specific set of object types (like Array), which is very justifiable. So sure.
"This is somewhat more of an aim for the future instead of current reality, but allow me to give some easy examples: The ArrayBuffer object in ECMAScript supports allocating up to 2^53 bytes of data. Most engines only allow a tad bit over 2^32 bytes but nevertheless, the fact of the matter is that you need more than 4 bytes to store that byte value. As a result, ArrayBuffer itself but also DataView and all the various TypedArray variants like Uint8Array must carry within them 8 byte data fields for byte offset, byte length, and even array length. Now ask yourself, how often do you deal with ArrayBuffers larger than 4 GiB? Not very often, obviously."
I'm guessing this is leading to a decision many languages have made about numbers and strings, where there's special types for small numbers and short strings (exposed only in the implementation). Or even more special types, where the pointers become values.
Also I can see a benefit to keeping track of "normal" Arrays and whatnot, so some of JavaScripts weird-but-not-usually-used behavior can be isolated, and normal behavior fast-tracked.
"In Nova we aim to split objects into parts to ensure that computationally unconnected parts are also stored separately in memory"
But this I don't get. If you are splitting things by type, how can you cluster them by how they are related? An object like {a: 1, b: 2} is an object with two strings and two numbers, presumably spread out over three different type-specific heaps?
Obligatory "Torille!" as a fellow Finn.
Fun coincidence that you started this project, I've had this exact same idea brewing for a few years, but did not bite the bullet yet :D
Have you considered using Bevy as a base ECS as they have an automatic archetype (shape) handling in the library? This was essentially my original idea, to implement a JS runtime on top of Bevy. (And over the years slap together a browser after the JS starts working)
[dead]
Apologies for an insubstantial complaint, but there are just too many things called “Nova” (or its Greek counterpart, “Neo”). I get it, it’s new, but isn’t there anything more specific or unique to reflect in the project name?
[flagged]
[flagged]
Architectural choices are interesting to talk about, but I think most people reading this won't have any context to compare against, me included. How does this compare to e.g. the architecture of V8? What benefits do these choices give when compared against other engines? Etc, reading through the list it's easy to nod along, but it's hard to actually have an intuition about whether these are good choices or not.