wingolog

javascript weakmaps should be iterable

2024-08-19T20:13:20Z

Good evening. Tonight, a brief position statement: it is a mistake for JavaScript’s WeakMap to not be iterable, and we should fix it.

story time

A WeakMap associates a key with a value, as long as the key is otherwise reachable in a program. (It is an ephemeron table.)

When WeakMap was added to JavaScript, back in the ES6 times, some implementors thought that it could be reasonable to implement weak maps not as a data structure in its own right, but rather as a kind of property on each object. Under the hood, adding an key→value association to a map M would set key[M] = value. GC would be free to notice dead maps and remove their associations in live objects.

If you implement weak maps like this, or are open to the idea of such an implementation, then you can’t rely on the associations being enumerable from the map itself, as they are instead spread out among all the key objects. So, ES6 specified WeakMap as not being iterable; given a map, you can’t know what’s in it.

As with many things GC-related, non-iterability of weak maps then gained a kind of legendary status: the lore states that non-iterability preserves some key flexibility for JS implementations, and therefore it is good, and you just have to accept it and move on.

dynamics

Time passed, and two things happened.

One was that this distributed WeakMap implementation strategy did not pan out; everyone ended up implementing weak maps as their own kind of object, and people use an algorithm like the one Steve Fink described a couple years ago to compute the map×key⇒value conjunction. The main original motivation for non-iterability was no longer valid.

The second development was WeakRef and FinalizationRegistry, which expose some details of reachability as viewed by the garbage collector to user JS code. With WeakRef (and WeakMap), you can build an iterable WeakMap.

(Full disclosure: I did work on ES6 and had a hand in FinalizationRegistry but don’t do JS language work currently.)

Thing is, your iterable WeakMap is strictly inferior to what the browser can provide: its implementation is extraordinarily gnarly, shipped over the wire instead of already in the browser, uses more memory, is single-threaded and high-latency (because FinalizationRegistry), and non-standard. What if instead as language engineers we just did our jobs and standardized iterability, as we do with literally every other collection in the JS standard?

Just this morning I wrote yet another iterable WeakSet (which has all the same concerns as WeakMap), and while it’s sufficient for my needs, it’s not good (lacking prompt cleanup of dead entries), and by construction can’t be great (because it has to be redundantly implemented on top of WeakSet instead of being there already).

I am sympathetic to deferring language specification decisions to allow the implementation space to be explored, but when the exploration is done and the dust has settled, we shouldn’t hesitate to pick a winner: JS weak maps and sets should be iterable. Godspeed, brave TC39 souls; should you take up this mantle, you are doing the Lord’s work!

Thanks to Philip Chimento for notes on the timeline and Joyee Cheung for notes on the iterable WeakMap implementation in the WeakRef spec. All errors mine, of course!

arrow functions coming to chrome 45!

2015-06-18T16:41:17Z

It's been a long time coming, but I just flipped the bit in V8 that will ship arrow functions in Chrome 45! Woo hoo!

You probably know, but arrow functions are a new way to write functions in JavaScript. They look like this:

// Two arguments, body implicitly returned.
(x, y) => x + y

// With just one argument, no parentheses needed.
x => x * 2

// Body can have braces too; in that case use "return".
x => { return x * 2 }

Relative to the other kind of function that is written like function (x) { return x * 2 }, arrow functions don't define this or arguments in their bodies, instead capturing these values from the environment. There are a couple of other minor differences, too, but instead of writing about them here I'll just point to the great article by Jason Orendorff of the SpiderMonkey team.

Arrow functions are part of the JavaScript language standard that was called "ECMAScript 6" or ES6, and I guess you could still call it that. It seems like a silly thing for the committee to do to throw away all their branding like that but they decided to rename it ECMAScript 2015, which I'm sure is a link that the pedants are glad I have included. The upshot is that the standard is now final, gold master, etched in stone, which from an implementor's perspective is a relief. You can practically feel the anxiety ebbing away by the happy rate at which commits bubble out of source repositories and into shipping browsers, free from the fear that some spec change will force the hack-stream to change course.

From the V8 side, our arrow function implementation has also been a long time coming. My colleague Adrián Pérez did the first half of the work, and I picked up on the back end of things. It seems like such a small feature and in many ways it is, but still it took a long time. Now I know that my readers are a bunch of nerds and many of you like implementing languages, so you might appreciate these nargish points.

One of the first bits is that arrow functions are hard to parse. Consider, this is a valid JavaScript expression:

(x,y)

It's a "comma expression" that will evaluate x then y and its result will be the result of evaluating y. But add an arrow on after the end and you get not an expression but a formal parameter list:

(x,y)=>x+y

Now you might think, well OK, when you see an arrow, rewind the input stream and parse in "arrow function mode". Indeed that would be fine, but not in combination with some additional ES6 features, optional and destructuring arguments. Optional arguments look like this:

(x=42)=>x

The =42 part is the expression that will be evaluated to give x a value, if the function is called with no arguments. Note that this bit is still under implementation in V8 so you can't try it in your browser. An optional argument initializer is an expression and not a value, so you can also have:

(x=(x)=>42)=>x

Combined, this makes rewinding the token stream a proposition of exponential complexity, which is a no-go for a production JavaScript parser. Parsers are on the hot path for page-load times and no browser vendor wants to introduce a pathological case into their page load.

Instead, V8 does something I hadn't seen before. It keeps an open mind about whether something is a comma expression or a formal parameter list of an arrow function, and only makes a decision when it sees the => (or not). As it parses, V8 records places that it would signal an error for either a parameter list or for an expression, and then when that superimposed wave function collapses it checks that the production is valid, signalling the appropriate error if not. I thought this was a really neat trick, so if you're into that thing see expression classifier to see those details.

The other thing that's tricky about arrow functions is the this binding. In JavaScript, this is basically a hidden parameter passed to a function when it is called. Calling a function like o.f() passes the value of o to f as its this parameter. If instead f() is called directly, like with no dot before the call, then undefined is passed as this. Also for sloppy-mode functions, if the passed this value isn't an object, then the global object instead is assigned to this. Finally outside a function, this is bound to the global object.

OK, I know all of you know these things. Thing is, you always have a this, and although it's like a variable it's not a valid variable name, and before ES6 nothing could capture its value, because each function has its own this value. Perhaps you see where I'm going with this (ahem) now. Arrow functions introduce a function scope that doesn't have a this value, and that indeed might capture some other scope's this value, forcing it to be context-allocated. Other parts of ES6 can actually force assignment to this, like a super call, and that assignment can actually come from within an arrow function. Zounds! A simple concept, but there was a lot of incidental complexity in V8 around the implementation. Between Adrián and myself it took like three months to fix this usage in V8 to always just go through the (possibly context-allocated) variable, and there are still probably some devtools bugs to find in the upcoming weeks.

Performance-wise, arrow functions are just like functions. They should be just as fast as if you wrote them with function. So use them with joy, use them with abandon, use them judiciously -- however you decide you use them, don't let perf influence your decision one way or the other.

That's about it! Like all of my JS engine work over the past couple years, this hacking was sponsored by fabulous folks over at Bloomberg, so big ups to them. From me and Adrián at Igalia, until next time! We leave you to puzzle out what this bit of JavaScript evaluates to:

(({},{},({},{})=>({},{}))=>(({},{})=>({},{}),{},{}))({},{})

Happy hacking!

generators in firefox now twenty-two times faster

2014-11-14T08:41:08Z

It's with great pleasure that I can announce that, thanks to Mozilla's Jan de Mooij, the new ES6 generator functions are twenty-two times faster in Firefox!

Some back-story, for the unawares. There's a new version of JavaScript coming, ECMAScript 6 (ES6). Among the new features that ES6 brings are generator functions: functions that can suspend. Firefox's JavaScript engine, SpiderMonkey, has had support for generators for many years, long before other engines. This support was upgraded to the new ES6 standard last year, thanks to sponsorship from Bloomberg, and was shipped out to users in Firefox 26.

The generators implementation in Firefox 26 was quite basic. As you probably know, modern JavaScript implementations have a number of tiered engines. In the case of SpiderMonkey there are three tiers: the interpreter, the baseline compiler, and the optimizing compiler. Code begins execution in the interpreter, which is the quickest engine to start. If a piece of code is hot -- meaning that lots of time is being spent there -- then it will "tier up" to the next level, where it is analyzed, possibly optimized, and then compiled to machine code.

Unfortunately, generators in SpiderMonkey have always been stuck at the lowest tier, the interpreter. This is because of SpiderMonkey's choice of implementation strategy for generators. Generators were implemented as "floating interpreter stack frames": heap-allocated objects whose shape was exactly the same as a stack frame in the interpreter. This had the advantage of being fairly cheap to implement in the beginning, but ultimately it made them unable to take advantage of JIT compilation, as JIT code runs on its own stack which has a different layout. The previous strategy also relied on trampolining through a helper written in C++ to resume generators, which killed optimization opportunities.

The solution was to represent suspended generator objects as snapshots of the state of a stack frame, instead of as stack frames themselves. In order for this to be efficient, last year we did a number of block scope optimizations to try and reduce the amount of state that a generator frame would have to restore. Finally, around March of this year we were at the point where we could refactor the interpreter to implement generators on the normal interpreter stack, with normal interpreter bytecodes, with the vision of being able to JIT-compile those bytecodes.

I ran out of time before I could land that patchset; although the patches were where we wanted to go, they actually caused generators to be even slower and so they languished in Bugzilla for a few more months. Sad monkey. It was with delight, then, that a month or so ago I saw that SpiderMonkey JIT maintainer Jan de Mooij was interested in picking up the patches. Since then he has been hacking off and on at getting my old patches into shape, and ended up applying them all.

He went further, optimizing stack frames to not reserve space for "aliased" locals (locals allocated on the scope chain), speeding up object literal creation in the baseline compiler and finally has implemented baseline JIT compilation for generators.

So, after all of that perf nargery, what's the upshot? Twenty-two times faster! In this microbenchmark:

function *g(n) {
    for (var i=0; iBefore, it took SpiderMonkey 980 milliseconds to complete on Jan's machine.  After?  Only 43!  It's actually marginally faster than V8 at this point, which has (temporarily, I think) regressed to 45 milliseconds on this test.  Anyway.  Competition is great and as a committer to both projects I find it very satisfactory to have good implementations on both sides.
As in V8, in SpiderMonkey generators cannot yet reach the highest tier of optimization.  I'm somewhat skeptical that it's necessary, too, as you expect generators to suspend fairly frequently.  That said, a yield point in a generator is, from the perspective of the optimizing compiler, not much different from a call site, in that it causes all locals to be saved.  The difference is that locals may have unboxed representations, so we would have to box those values when saving the generator state, and unbox on restore.
Thanks to Bloomberg for funding the initial work, and big, big thanks to Mozilla's Jan de Mooij for picking up where we left off.  Happy hacking with generators!

es6 generator and array comprehensions in spidermonkey

2014-03-07T21:11:55Z

Good news, everyone: ES6 generator and array comprehensions just landed in SpiderMonkey!

Let's take a quick look at what comprehensions are, then talk about what just landed in SpiderMonkey and when you'll see it in a Firefox release. Thanks to Bloomberg for sponsoring this work.

comprendes, mendes

Comprehensions are syntactic sugar for iteration. Unlike for-of, which processes its body for side effects, an array comprehension processes its body for its values, collecting them into a new array. Like this:

// Before (by hand)
var foo = (function(){
             var result = [];
             for (var x of y)
               result.push(x*x);
             return result;
           })();

// Before (assuming y has a map() method)
var foo = y.map(function(x) { return x*x });

// After
var foo = [for (x of y) x*x];

As you can see, array comprehensions are quite handy. They're expressions, not statements, and so their result can be passed directly to whatever code needs it. This can make your program more clear, because you aren't forced to give names to intermediate values, like result. At the same time, they work on any iterable, so you can use them on more kinds of data than just arrays. Because array comprehensions don't make a new closure, you can access arguments and this and even yield from within the comprehension tail.

Generator comprehensions are also nifty, but for a different reason. Let's look at an example first:

// Before
var bar = (function*(){ for (var x of y) yield x })();

// After
var bar = (for (x of y) x);

As you can see the syntactic win here isn't that much, compared to just writing out the function* and invoking it. The real advantage of generator comprehensions is their similarity to array comprehensions, and that often you can replace an array comprehension by a generator comprehension. That way you never need to build the complete list of values in memory -- you get laziness for free, just by swapping out those square brackets for the comforting warmth of parentheses.

Both kinds of comprehension can contain multiple levels of iteration, with embedded conditionals as you like. You can do [for (x of y) for (z of x) if (z % 2) z + 1] and all kinds of related niftiness. Comprehensions are almost always more concise than map and filter, with the added advantage that they are usually more efficient.

what happened

SpiderMonkey has had comprehensions for a while now, but only as a non-standard language extension you have to opt into. Now that comprehensions are in the draft ES6 specification, we can expose them to the web as a whole, by default.

Of course, the comprehensions that ES6 specified aren't quite the same as the ones that were in SM. The obvious difference is that SM's legacy comprehensions were written the other way around: [x for (x of y)] instead of the new [for (x of y) x]. There were also a number of other minor differences, which I'll list here for posterity:

ES6 comprehensions create one scope per "for" node -- not one for the comprehension as a whole.
ES6 comprehensions can have multiple "if" components, which may be followed by other "for" or "if" components.
ES6 comprehensions should make a fresh binding on each iteration of a "for", although Firefox currently doesn't do this (bug 449811). Incidentally, for-of in Firefox has this same problem.
ES6 comprehensions only do for-of iteration, not for-in iteration.
ES6 generator comprehensions always need parentheses around them. (The parentheses were optional in some cases for SM's old generator comprehensions.
ES6 generator comprehensions are ES6 generators (returning {value, done} objects), not legacy generators (StopIteration).

I should note in particular that the harmony wiki is out of date, as the feature has moved into the spec proper: array comprehensions, generator comprehensions.

For another fine article on ES6 generators, check out Ariya Hidayat's piece on comprehensions from about a year ago.

So, ES6 comprehensions just landed in SpiderMonkey today, which means it should be part of Firefox 30, which should reach "beta" in April and become a stable release in June. You can try it out tomorrow if you use a nightly build, provided it doesn't cause some crazy breakage tonight. As of this writing, Firefox will be the first browser to ship ES6 array and generator comprehensions.

colophon

I had a Monday of despair: hacking at random on something that didn't solve my problem. But I had a Tuesday morning of pleasure, when I realized that my Monday's flounderings could be cooked into a delicious mid-week bisque; the hack was obvious and small and would benefit the web as a whole. (Wednesday was for polish and Thursday on another bug, and Friday on a wild parser-to-OSR-to-assembly-and-back nailbiter; but in the end all is swell.)

Thanks again to Bloomberg for this opportunity to build out the web platform, and to Mozilla for their quality browser wares (and even better community of hackers).

This has been an Igalia joint. Until next time!

optimizing let in spidermonkey

2013-12-18T20:00:23Z

Peoples! Firefox now optimizes let-bound variables!

What does this mean, you ask? Well, as you nerdy wingolog readers probably know, the new ECMAScript 6 standard is coming soon. ES6 has new facilities that make JavaScript more expressive. At the same time, though, many ES6 features stress parts of JavaScript engines that are currently ignored.

One of these areas is lexical scope. Until now, with two exceptions, all local variables in JavaScript have been "hoisted" to the function level. (The two exceptions are the exception in a catch clause, and the name of a named function expression.) Therefore JS engines have rightly focused on compiling methods at a time, falling back to slower execution strategies when lexical scopes are present.

The presence of let in ES6 changes this. It was a hip saying a couple years ago that "let is the new var", but in reality no one uses let -- not only because engines didn't yet implement let without feature flags, if they implemented it at all, but because let was slow. Because let-bound variables were on the fallback path, they were unoptimized.

But that is changing. Firefox now optimizes many kinds of let-bound variables, making "let is the new var" much closer to being an actual thing. V8 has made some excellent steps in similar directions as well. I'll focus on the Firefox bit here as that's what I've been working on, then go on to mention the work that the excellent V8 hackers have done.

implementing scope in javascript

JavaScript the language is defined in terms of a "scope chain", with respect to which local variables are looked up. Each link in the chain binds some set of variables. Lookup of a name traverses the chain from tip to tail looking for a binding.

Usually it's possible to know at compile-time where a binding will be in the chain. For example, you can know that the "x" reference in the returned function here:

function () {
  var x = 3;
  return function () { return x; };
}

will be one link up in the scope chain, and will be stored in the first named slot in that link. This is a classic implementation of lexical scope, and it plays very well with the semantics of JavaScript as specified. All engines do something like this.

Another thing to note in this case is that we don't need to store the name for "x" anywhere, as we can see through all of its uses at compile-time. In practice however, as JavaScript functions are typically compiled lazily on first call, we do store the association "x -> slot 0" in the outer function's environment.

(Could you just propagate the constant, you ask? Of course you could. But since JS compilation typically occurs one function at a time, lazily, no engine does this initially. Of course later when the optimizing compiler kicks in, it does get propagated, but that usually doesn't avoid the scope chain node allocation.)

Let's take another case.

function foo() { var x = 3; return x; }

In this case we know that the "x" reference can be found in the top link of the chain, in the first slot. Semantically speaking, that is. No JS engine implements things this way. Instead we take advantage of noticing that "x" cannot be captured by any other function. Therefore we are free to assign "x" to a slot in a stack frame, and refer to it by index directly -- without allocating a scope chain node on the heap. All engines do this.

And of course later if this function is hot, the constant exists only in a register, probably inlined to its caller. For this reason scope hasn't had a lot of work put into it. The unit of optimization -- the function -- is the same as the unit of scope. If the implementation details of scope get costly, we take advantage of run-time optimization to speculatively remove allocations, loads, and stores.

OK. What if you have some variables captured, and some not captured?

function foo() { var x = 3; var y = 4; return function () { return x; } }

Here "x" is captured, but "y" is not. Usually engines will allocate "x" on the scope chain, and "y" as a local. (JavaScriptCore is an interesting exception; it uses a strategy I haven't seen elsewhere called "lazy tear-off". Basically all variables are on the stack for the dynamic extent of the scope, whether they are captured or not. If needed, potentially lazily, an object is pushed on the scope chain that aliases the stack slots. If a scope is pushed on the scope chain, when the scope ends the current values are "torn off" the stack and copied to the heap, and the slots pointer of the scope object updated to point to the heap.)

I digress. "x" on the scope chain, "y" not. (I regress: why allocate "y" at all, isn't it dead? Yes it is dead. Optimizing compilers kill it. Baseline compilers don't bother.) So typically access to "x" goes through the scope chain, and access to "y" does not.

Calculating the set of variables that can be captured is one of the few static analyses done by JS baseline compilers. The other ones are whether we are in strict mode or not, whether nested scopes contain "with" or "eval" (thus forcing all locals to be on the scope chain), and whether the "arguments" object is used or not. There may be more, but that is the lowest common denominator of efficiency.

implementing let

let is part of ES6, and all browsers will eventually implement it.

Let me give you an insight as to the mindset of a browser maker. What a browser maker does when they see a feature is first to try to ignore it -- all features have a cost and not all of them pay for themselves. Next you try to do the minimum correct thing -- the thing that passes all the test suites, but imposes a minimal burden on the rest of the system. Usually this means that the feature probably works, except for the corner cases which users will file bugs for, but it is slow. Finally when there is either an internal use the browser maker cares about (Google, Mozilla to an extent) or you want to heat up a benchmark war (everyone else), you start to optimize.

The state of let was until recently between "ignore" and "slow". "Slow" means different things to different browsers.

The engines that have started to implement let are V8 (Chrome) and SpiderMonkey (Firefox). In Chrome, until recently using let in a function was an easy way of preventing that function from ever being optimized by Crankshaft. Good times? When writing this article I was going to claim this was still the case, but I see that in fact this is happily not true. You can Crankshaft a function that has let bindings. I believe, with a cursory glance, that locals that are not captured are still allocated on a scope chain -- but perhaps I am misreading things.

Firefox, on the other hand, would not Ion-compile any function with a let. You would think that this would not be the case, given that Firefox has had let for many years. If you thought this, you misunderstand how browser development works. Basically let me lay it out for you:

class struggle : marxism :: benchmarks : browsers

Get it? Benchmarks are the motor of browser history. If this bloviation has any effect, please let it be to inspire you to go and make benchmarks!

Anyway, Firefox. No Ion for let. The reason why you would avoid optimizing these things is clear -- breaking the "optimization unit == scope unit" equivalence has a cost. However the time has come.

What I have done is to recognize let scopes that have no captured locals. This class of let scope will now be optimized by Ion. So for example in this case:

function sumto(n) {
  let sum = 0;
  for (let i=0; iPreviously this would allocate a block scope around the "for".  (Incidentally, Firefox borks block scoping in this case; each iteration should produce a fresh binding.  But this is historical, will be fixed, and I digress.)  The operations that created and pushed block scopes were not optimized.  Avoiding pushing and popping scopes from the scope chain avoids this limitation, and thus we have awesome speed!
The effect of simply turning on Ion cannot be denied.  In this case, the runtime of a sum up to 1e9 goes from 8.8 seconds (!) to 0.8 seconds.  Obviously it's a little micro-benchmark but that's how I'm rolling today.  The biggest optimization is to stop deoptimization.  And here's where history cranks up: V8 takes 7.7 seconds on this loop.  Gentlefolk, start your engines!
At the same time, in Firefox we are still able to reify a parallel chain of "debug scopes" as needed that corresponds to the scope chain as the specification would see it.  This was the most challenging part of optimizing block scope in Firefox -- not the actual optimization, which was trivial, but optimization while retaining introspection and debuggability.
future work
Unhappily, scopes with let-bound variables that are captured by nested scopes ("aliased", in spidermonkey parlance) are not yet optimized.  So not only do they cause scope chain allocation, but they also don't benefit from Ion.  Boo.  Bug 942810.
I should also mention that the let supported by Firefox is not the let specified in ES6.  In ES6 there is thing called a "temporal dead zone" whereby it is invalid to access the value of a let before its initialization.  This is like Scheme's "letrec restriction", and has to be enforced dynamically for similar reasons.  V8 does a great job in actually implementing this, and Firefox should do it soon.
Of course, it's not clear to me how let can actually be deployed without breaking the tubes.  I think it probably can somehow but I haven't checked the latest TC39 cogitations in that regard.
twisted paths
It's been a super-strange end-of-year.  I was sure I would be optimizing SpiderMonkey generators and moving on to other things, but I just got caught up with stuff -- the for-of bits took approximately forever and then the amount of state carried in SpiderMonkey stack frames filled me with despair.  So it was that this hack, while assisting me in that goal, was not actually a planned thing.
See, SpiderMonkey used to actually reserve a stack slot for the "block chain".  No, they aren't using your browser to mine for Bitcoins, though that would be a neat hack.  The "block chain" was a simulation of the spec-mandated scope chain.  But of course this might not reflect the actual implemented, optimized behavior -- but again one might want to map them to each other for debugging purposes.  It was a mess.
The resulting changeset to fix this ended up so large that it took weeks to land.  Well, live and learn, right?  I remember Michael Starzinger telling me the same thing about V8 -- you just have to keep your patches small, as small as possible, and always working.  Words to the wise indeed.
happy days
But in the end we at least have some juice from this citric fruit.  This has been an Igalia joint.  Thanks very much to Mozilla's Luke Wagner for suffering through the reviews.
Thanks especially to Bloomberg for making this work possible; you folks are swell.  Forward ES6!

es6 generators and iteration in spidermonkey

2013-10-07T15:17:56Z

It is my great pleasure to announce that SpiderMonkey, the JavaScript engine of the Firefox browser, just got support for ES6 generators and iteration!

the chase

As I have written before, generators are functions that can suspend. SpiderMonkey has had generators since Firefox 2 (really!), but the new thing is that now their generators are standard, and available by default. No need to futz with Firefox-specific "language versions": there is just 1JS now.

My last article extolled the virtues of generators for asynchronous programs, and since then there have been a number of great articles and talks around the web on that topic. Generators are a kind of coroutine, and coroutines work well for asynchronous computation.

But that's not all that coroutines are good for. Another great application of generators is in iteration, where you have a producer that knows how to produce values, and a consumer that knows how to process them. Being able to write the producer as a coroutine is great win.

words, words, words

This is a bit abstract, so let me show some code. Let's say you are implementing an auto-complete widget, so you use a trie. Here is a simple trie:

function Trie() {
  this.kids = Object.create(null);
  this.terminal = false;
}

Trie.prototype.insert = function (str) {
  if (str.length == 0) {
    this.terminal = true;
  } else {
    var node = this.kids[str[0]];
    if (!node)
      this.kids[str[0]] = node = new Trie;
    node.insert(str.slice(1));
  }
}

This kind of data structure is nice because it encodes the minimal amount of information. Unlike some of the examples on Wikipedia, the information from the prefix is not contained in the suffix.

Let's add some words:

var db = new Trie;
['fee', 'fie', 'fo', 'foo'].forEach(db.insert.bind(db));

Now, how do we get the completions?

Trie.prototype.completions = function (prefix) {
  // ?
}

It would be nice to avoid making an array of all of the possible completions, because who knows how many we're actually going to show. One of the whole points of using a trie was to avoid storing all of the strings into memory.

Generators offer a nice solution to this problem:

Trie.prototype.strings = function* () {
  if (this.terminal) yield '';
  for (var k in this.kids)
    for (var suffix of this.kids[k].completions(''))
      yield k + suffix;
}

Trie.prototype.completions = function* (prefix) {
  if (prefix.length) {
    var node = this.kids[prefix[0]];
    if (node !== undefined)
      for (var suffix of node.completions(prefix.slice(1)))
        yield prefix[0] + suffix;
  } else {
    yield* this.strings();
  }
}

Here you see that the information that is part of the prefix is now present implicitly, as part of the control stack. Operations on recursive data structures are naturally expressed with recursive functions.

Note that if I call db.strings(), nothing happens -- not even if there are millions of strings in the trie. Values get produced by the coroutine only as the consumer asks for them.

Speaking of consumers, "for-of" is usually the consumer that you want, as it is the built-in ES6 language construct for iteration. For example:

js> for (var word of db.completions('fo')) print(word)       
fo
foo
js> for (var word of db.completions('f')) print(word)  
fee
fie
fo
foo
js> for (var word of db.completions('q')) print(word)
js>

This code is not just part of some future standard: it's available now in Firefox nightlies, out of the box. It's also available in Chrome, if you enable "experimental features".

niceties

What is also in Firefox, but not yet in Chrome, is support for custom iterators. Wouldn't it be nice if you could just say:

for (var word of db)
  print(word);

Well you can! But here it gets complicated.

The right-hand side of the for-of statement, in this case db, is the iterable. The iterator is derived from the iterable by calling the iterable's special @@iterator method. The @@ indicates that this method is keyed by a "well-known symbol".

Symbols are a new kind of data type in ES6, and their salient characteristic is that they can be used as unique property keys. You can imagine that in the future there is some piece of the JavaScript engine that, on startup, does something like this:

function Symbol() {}
Symbol.iterator = new Symbol;

and then uses that Symbol.iterator value as a property key.

However, Firefox does not implement symbols yet. In the meantime Firefox uses a special string as a stand-in for @@iterator. This name is not publicised anywhere, but you can get the value of @@iterator using this future-proof code:

const iterator = (function() {
  try {
    for (var _ of Proxy.create({get: function(_, name) { throw name; } }))
      break;
  } catch (name) {
    return name;
  }
  throw 'wat';
})();

However²! V8 implements for-of but not the @@iterator protocol, so if you target V8 also, you might find this gist to be useful.

Given iterator, we can now make Trie iterable directly:

Trie.prototype[iterator] = Trie.prototype.strings;

js> for (var word of db)
      print(word);
fee
fie
fo
foo

how did we get here?

That's the status: Firefox does ES6 generators and iterators. Excellent!

So, I have to toot my toot some bells and ring some horns here. I implemented ES6 generators and iteration in SpiderMonkey as part of my work with Igalia, your friendly neighborhood browser consultancy. I did the same in V8 a few months ago.

It seems odd, on the face of it, to hack on core components of two major browser engines, while not being employed by either associated company. We think of Google, Mozilla, Apple, and Microsoft as the ones that are really "driving the web", either because it's strategic for them, or because they are shamed by their competitors into doing so. But that's not the totality of the browser ecosystem.

To develop the web platform, there is a simple condition: you have to ship a browser, or components of a browser (e.g. a JS engine). If you do that, you can usefully invest in improving your tools, to make them better for you in some way. The magical part is that when you improve a browser, you improve the web.

Since three of the four main browser engines are Free Software, it's possible for third parties that ship browsers to meaningfully contribute to them -- and happily, this improves not only their products but also the web as a whole.

So for ES6 generators and iteration, big ups to Bloomberg, who sponsored this work. More expressive language features helps them better develop their products, both on the server and client side. Because they deploy V8 and SpiderMonkey components, they were in this magical position of being able to usefully invest in the web platform, making it better for all of us. Thanks!

Thanks also to the great Mozilla folks, who were very kind, helpful, and forgiving of my mozilla-rookie mistakes. It was also a privilege to be able to get such a close insight into the workings of Firefox development. Thanks especially to Jason Orendorff and Jeff Walden for their detailed patch reviews, and to Till Schneidereit for enduring many stupid questions over IRC :) Y'all are great!

next

I have a richly variegated set of minor bugs to fix up with this work, so while I've still got this all paged in, do be sure to give a try to the Firefox nightlies and see if this works for you. File bugs in Bugzilla and copy me (:wingo) on them. Thanks in advance for submitting bugs there and not in the blog comments :) Happy hacking!

ecmascript generators from a performance perspective

2013-06-11T13:48:00Z

It's been really gratifying to see the recent interest in ES6 generators in V8. I'm still a Schemer at heart, but one thing all language communities should learn from JS is its enthusiasm and sense of joyful play. One can't spend much time around folks like James Long or Domenic Denicola without feeling the urge to go out and build something fun.

perfie perf perf perf

But this is a blog about nargery! A lot of people have been speculating about the performance impact of using generators, and I've been meaning to write about it, so perhaps this article will tickle your neuron.

A generator is a function that can suspend. So the first perf question to ask yourself is, does the performance of generators matter at all? If your generator spends more time suspended than active, then probably not. Though it's not the only application of generators, if you are using generators for asynchronous programming, then probably generator performance matters not at all.

But that's not very satisfying, right? Even though it probably doesn't matter, maybe it does, and then you need to make sure your mental model of what is fast and what is slow corresponds to reality.

crankshaft

So, the skinny: as implemented in V8, the primary perf impact of generators is that they do not get optimized by Crankshaft.

As you probably know, Crankshaft is V8's optimizing compiler. There are two ways a function can be optimized: on entry, because the function is called many times, or within a loop, because the loop is taking too long. In the first case it's fairly easy: you kick off a parallel task to recompile the function, reading the type feedback collected by the inline caches from the baseline compiler. Once you've made the optimized function, you swap the function's code pointer, and the next time you call the function you get the optimized version instead of the generic version.

The other way is if you are in a function, in a hot loop, and you decide to optimize the function while you are in it. This is called on-stack replacement (OSR), and is important for functions that are only called once but do have hot loops. (Cough KRAKEN cough.) It's trickier to do parallel compilation with OSR, as the function might have left the loop by the time recompilation finishes, but oh well.

So in summary, when V8 goes to optimize a function it will be in one of two states: inactive, because it hasn't started to run yet, or active, because it's stuck in a tight loop.

Generators introduce a new state for functions: suspended. You can know whether a function is active or inactive, but you can't know how many suspended generator activations are out in the JS heap. So you have multiple potential entry points to the function, which complicates the optimizer as it has to work on graphs with irreducible loops. Perhaps it's still possible to optimize, as OSR is a form of irreducible loops; I suspect though that you would end up compiling as many optimized functions as there are yield points, plus one for the function entry (and possibly another for OSR). Anyway it's a bit complicated.

Also, deoptimization (bailout) becomes more difficult if you can have suspended activations, and as a technical detail, it's tricky for V8 to efficiently capture the state of an activation in a way that the garbage collector can understand.

The upshot is that tight loops in generators won't run fast. If you need speed in a loop in a generator, you will need to call out to some other function (which itself would get optimized).

baseline

On the other hand, I should note that generators are optimized for quick suspend and resume. In a generator function, V8 stores local variables on the heap instead of on the stack. You would think this would make things slow, but that is not the case. Access to locals has the same speed characteristics: both stack and heap locals are accessed by index from a register (the stack pointer or the context pointer).

There is an allocation overhead when you enter a generator to create the heap storage, but it is almost never the case that you need to copy or allocate when suspending a generator. Since the locals are on the heap already, there is nothing to do! You just write the current instruction pointer (something you know at compile-time) into a slot in the generator object and return the result.

Similarly, when resuming you have to push on a few words to rebuild the stack frame, but after that's done you just jump back to where you were and all is good.

The exception to this case is when you yield and there are temporaries on the stack. Recall in my article on V8's baseline compiler that the full-codegen is a stack machine. It allocates slots to named locals, but temporary values go on the stack at run-time, and unfortunately V8's baseline compiler is too naive even to know what the stack height is at compile-time. So each suspension checks the stack height, and if it is nonzero, it calls out to a runtime helper to store the operands.

This run-time save would be avoidable in register machines like JavaScriptCore's baseline compiler, which avoids pushing and popping on the stack. Perhaps this note might spur the competitive urges of some fine JSC hacker, to show up V8's generators performance. I would accept JSC having faster generators if that's what it took to get generators into WebKit, at least for a while anyway :)

abstract musings

Many of you have read about the Futamura projections, in which one compiles a program via partial application of an interpreter with respect to a given input program. Though in the real world this does not appear to be a good strategy for implementing compilers, as so much of compilation is not simply about program structure but also about representation, there is a kernel of deep truth in this observation: the essence of compilation is partial evaluation of a strategy with respect to a particular program.

This observation most often helps me in the form of its converse. Whenever I find myself implementing some sort of generic evaluation strategy, there's a sign in my head that start blinking the word "interpreter". (Kinda like the "on-air" signs in studios.) That means I'm doing something that's probably not as efficient as it could be. I find myself wondering how I can make the strategy specific to a particular program.

In this case, we can think of suspending and resuming not as copying a stack frame out and in, but instead compiling specific stubs to copy out only the values that are needed, in the representations that we know that they have. Instead of looping from the bottom to the top of a frame, you emit code for each element. In this way you get a specific program, and if you manage to get here, you have a program that can be inlined too: the polymorphic "next" reduces to a monomorphic call of a particular continuation.

Anyway, Amdahl's law limits the juice we can squeeze from this orange. For a tight for-of loop with a couple of optimizations that should land soon in V8, I see generator performance that is only 2 times slower that the corresponding hand-written iterators that use stateful closures or objects: 25 nanoseconds per suspend/resume versus 12 for a hand-written iterator.

25 nanoseconds, people. You're using jQuery and asking for data from a server in the next country. It is highly unlikely that generators will be a bottleneck in your program :)