ahead-of-time wasm gc in wastrel
Hello friends! Today, a quick note: the Wastrel ahead-of-time WebAssembly compiler now supports managed memory via garbage collection!
hello, world
The quickest demo I have is that you should check out and build wastrel itself:
git clone https://codeberg.org/andywingo/wastrel cd wastrel guix shell # alternately: sudo apt install guile-3.0 guile-3.0-dev \ # pkg-config gcc automake autoconf make autoreconf -vif && ./configure make -j
Then run a quick check with hello, world:
$ ./pre-inst-env wastrel examples/simple-string.wat Hello, world!
Now give a check to gcbench, a classic GC micro-benchmark:
$ WASTREL_PRINT_STATS=1 ./pre-inst-env wastrel examples/gcbench.wat Garbage Collector Test Creating long-lived binary tree of depth 16 Creating a long-lived array of 500000 doubles Creating 33824 trees of depth 4 Top-down construction: 10.189 msec Bottom-up construction: 8.629 msec Creating 8256 trees of depth 6 Top-down construction: 8.075 msec Bottom-up construction: 8.754 msec Creating 2052 trees of depth 8 Top-down construction: 7.980 msec Bottom-up construction: 8.030 msec Creating 512 trees of depth 10 Top-down construction: 7.719 msec Bottom-up construction: 9.631 msec Creating 128 trees of depth 12 Top-down construction: 11.084 msec Bottom-up construction: 9.315 msec Creating 32 trees of depth 14 Top-down construction: 9.023 msec Bottom-up construction: 20.670 msec Creating 8 trees of depth 16 Top-down construction: 9.212 msec Bottom-up construction: 9.002 msec Completed 32 major collections (0 minor). 138.673 ms total time (12.603 stopped); 209.372 ms CPU time (83.327 stopped). 0.368 ms median pause time, 0.512 p95, 0.800 max. Heap size is 26.739 MB (max 26.739 MB); peak live data 5.548 MB.
We set WASTREL_PRINT_STATS=1 to get those last 4 lines. So, this is a microbenchmark: it runs for only 138 ms, and the heap is tiny (26.7 MB). It does collect 30 times, which is something.
is it good?
I know what you are thinking: OK, it’s a microbenchmark, but can it tell us anything about how Wastrel compares to V8? Well, probably so:
$ guix shell node time -- \
time node js-runtime/run.js -- \
js-runtime/wtf8.wasm examples/gcbench.wasm
Garbage Collector Test
[... some output elided ...]
total_heap_size: 48082944
[...]
0.23user 0.03system 0:00.20elapsed 128%CPU (0avgtext+0avgdata 87844maxresident)k
0inputs+0outputs (0major+13325minor)pagefaults 0swaps
Which is to say, V8 takes more CPU time (230ms vs 209ms) and more wall-clock time (200ms vs 138ms). Also it uses twice as much managed memory (48 MB vs 26.7 MB), and more than that for the total process (88 MB vs 34 MB, not shown).
improving on v8, really?
Let’s try with quads, which at least has a larger active heap size. This time we’ll compile a binary and then run it:
$ ./pre-inst-env wastrel compile -o quads examples/quads.wat $ WASTREL_PRINT_STATS=1 guix shell time -- time ./quads Making quad tree of depth 10 (1398101 nodes). construction: 23.274 msec Allocating garbage tree of depth 9 (349525 nodes), 60 times, validating live tree each time. allocation loop: 826.310 msec quads test: 860.018 msec Completed 26 major collections (0 minor). 848.825 ms total time (85.533 stopped); 1349.199 ms CPU time (585.936 stopped). 3.456 ms median pause time, 3.840 p95, 5.888 max. Heap size is 133.333 MB (max 133.333 MB); peak live data 82.416 MB. 1.35user 0.01system 0:00.86elapsed 157%CPU (0avgtext+0avgdata 141496maxresident)k 0inputs+0outputs (0major+231minor)pagefaults 0swaps
Compare to V8 via node:
$ guix shell node time -- time node js-runtime/run.js -- js-runtime/wtf8.wasm examples/quads.wasm Making quad tree of depth 10 (1398101 nodes). construction: 64.524 msec Allocating garbage tree of depth 9 (349525 nodes), 60 times, validating live tree each time. allocation loop: 2288.092 msec quads test: 2394.361 msec total_heap_size: 156798976 [...] 3.74user 0.24system 0:02.46elapsed 161%CPU (0avgtext+0avgdata 382992maxresident)k 0inputs+0outputs (0major+87866minor)pagefaults 0swaps
Which is to say, wastrel is almost three times as fast, while using almost three times less memory: 2460ms (v8) vs 849ms (wastrel), and 383MB vs 141 MB.
zowee!
So, yes, the V8 times include the time to compile the wasm module on the fly. No idea what is going on with tiering, either, but I understand that tiering up is a thing these days; this is node v22.14, released about a year ago, for what that’s worth. Also, there is a V8-specific module to do some impedance-matching with regards to strings; in Wastrel they are WTF-8 byte arrays, whereas in Node they are JS strings. But it’s not a string benchmark, so I doubt that’s a significant factor.
I think the performance edge comes in having the program ahead-of-time: you can statically allocate type checks, statically allocate object shapes, and the compiler can see through it all. But I don’t really know yet, as I just got everything working this week.
Wastrel with GC is demo-quality, thus far. If you’re interested in the back-story and the making-of, see my intro to Wastrel article from October, or the FOSDEM talk from last week:
Slides here, if that’s your thing.
More to share on this next week, but for now I just wanted to get the word out. Happy hacking and have a nice weekend!