wingolog

a negative result

8 August 2023 9:03 AM (whippet | garbage collection | make)

Update: I am delighted to have been wrong! See the end.

Briefly, an interesting negative result: consider benchmarks b1, b2, b3 and so on, with associated .c and .h files. Consider libraries p and q, with their .c and .h files. You want to run each benchmark against each library.

P and Q implement the same API, but they have different ABI: you need to separately compile each benchmark for each library. You also need to separate compile each library for each benchmark, because p.c also uses an abstract API implemented by b1.h, b2.h, and so on.

The problem: can you implement a short GNU Makefile that produces executables b1.p, b1.q, b2.p, b2.q, and so on?

The answer would appear to be "no".

You might think that with call and all the other functions available to you, that surely this could be done, and indeed it's easy to take the cross product of two lists. But what we need are new rules, not just new text or variables, and you can't programmatically create rules. So we have to look at rules to see what facilities are available.

Consider the rules for one target:

b1.p.lib.o: p.c
	$(CC) -o $@ -include b1.h $<
b1.p.bench.o: b1.c
	$(CC) -o $@ -include p.h $<
b1.p: b1.p.lib.o b1.p.bench.o
    $(CC) -o $@ $<

With pattern rules, you can easily modify these rules to parameterize either over benchmark or over library, but not both. What you want is something like:

*.%.lib.o: %.c
	$(CC) -o $@ -include $(call extract_bench,$@) $<
%.*.bench.o: %.c
	$(CC) -o $@ -include $(call extract_lib,$@) $<
%: %.lib.o %.bench.o
	$(CC) -o $@ $<

But that doesn't work: you can't have a wildcard (*) in the pattern rule. (Really you would like to be able to match multiple patterns, but the above is the closest thing I can think of to what make has.)

Static pattern rules don't help: they are like pattern rules, but more precise as they apply only to a specific set of targets.

You might think that you could use $* or other special variables on the right-hand side of a pattern rule, but that's not the case.

You might think that secondary expansion might help you, but then you open the door to an annoying set of problems: sure, you can mix variable uses that are intended to be expanded once with those to be expanded twice, but the former set better be idempotent upon second expansion, or things will go weird!

Perhaps the best chance for a make-only solution would be to recurse on generated makefiles, but that seems to be quite beyond the pale.

To be concrete, I run into this case when benchmarking Whippet: there are some number of benchmarks, and some number of collector configurations. Benchmark code will inline code from collectors, from their header files; and collectors will inline code from benchmarks, to implement the trace-all-the-edges functionality.

So, with Whippet I am left with the strange conclusion that the only reasonable thing is to generate the Makefile with a little custom generator, or at least generate the part of it to do this benchmark-library cross product. It's hard to be certain about negative results with make; perhaps there is a trick. If so, do let me know!

epilogue

Thanks to a kind note from Alexander Monakov, I am very happy to be proven wrong.

See, I thought that make functions were only really good in variables and rules and the like, and couldn't be usefully invoked "at the top level", to define new rules. But that's not the case! eval in particular can define new rules.

So a solution with eval might look something like this:

BENCHMARKS=b1 b2 b3
LIBS=p q

define template
$(1).$(2).lib.o: $(2).c
	$$(CC) -o $$@ -include $(1).h $$<
$(1).$(2).bench.o: $(1).c
	$$(CC) -o $$@ -include $(2).h $$<
$(1).$(2): $(1).$(2).lib.o $(1).$(2).bench.o
    $$(CC) -o $$@ $$<
end

$(foreach BENCHMARK,$(BENCHMARKS),\
  $(foreach LIB,$(LIBS),\
     $(eval $(call template,$(BENCHMARK),$(LIB)))))

Thank you, Alexander!

on « the ministry for the future »

25 July 2023 7:49 PM (politics | climate | socialism)

Good evening, comrades. This evening, words on reading.

obtaining readables

Since the pandemic, or maybe a bit before, I picked up a new habit: if I hear of a book which is liked by someone I find interesting, I buy a copy.

Getting to this point required some mental changes. When I was younger, buying lots of books wasn’t really thinkable; as a student I was well-served by libraries in my mother tongue, and never picked up the habit of allocating my meager disposable income to books. I did like bookstores, especially used bookstores, but since leaving university I emigrated to countries that spoke different languages; the crate-digging avenue was less accessible to me.

I also had to make peace with capitalism, or as the comic says it, “participating in society”. We all know that Amazon imposes a dialectic on the written word: it makes it astonishingly easy and cheap to obtain anything you want, but also makes it hard to earn a living from writing, producing, and distributing books; consuming words via Amazon undermines the production of the words you want to read.

So I didn’t. Participate, I mean; didn’t order off Amazon. Didn’t read much either. At this point in my life, though, I see it everywhere: purity is the opposite of praxis. I still try to avoid Amazon, quixotically ordering mostly off AbeBooks, a now-fully-owned subsidiary of Amazon which fulfills via independent bookstores, but not avoiding Amazon entirely.

If you have access to and a habit of borrowing via a library in your language, you are almost certainly a better reader than I. But for the rest of us, and if you have the cash, maybe you too can grant yourself permission: when you hear of an interesting book, just buy it.

the ministry for the future

It was surely in this way that I heard of Kim Stanley Robinson’s « The Ministry for the Future ». My copy sat around for a year or so before I got around to it. We all hear the doom and gloom about the climate, and it’s all true; at least, everything about the present. But so much of it comes to us in a way that is fundamentally disempowering, that there is nothing to be done. I find this to be painful, viscerally; I prefer no news rather than bad news that I can’t act on, or at least incorporate into a longer narrative of action.

This may be a personal failing, or a defect. I have juvenile cultural tastes: I don’t like bad things to happen to good people. Jane Eyre was excruciating: to reach an ultimately heavenly resolution, as a reader I had to suffer with every step. Probably this is recovering-Catholic scar damage.

In any case, climate fiction tends to be more hell than heaven, and I hate it all the more for another reason, that we are here in this moment and we can do things: why tell us that nothing matters? Why would you write a book like that? What part of praxis is that?

Anyway it must have been the premise that made Robinson’s book pass my pre-filters: that the COP series of meetings established a new UN agency that was responsible for the interests of future generations, notably as regards climate. The book charts the story of this administration from its beginnings over a period of some 40 years or so. It’s a wonderfully human, gripping story, and we win.

We don’t win inevitably; the book isn’t a prediction, but more of a prophecy in a sense: an exercise in narrative genesis, a precursor to history, a seed: one not guaranteed to germinate, but which might.

At this point I find the idea of actually participating in a narrative to be almost foreign, or rather alienated. When I was younger, the talk was “after the revolution” this, or “in the future” that, but with enough time and deracination, I lost my vision of how to get there from here. I don’t have it back, not yet anyway, but I think it’s coming. Story precedes history; storytellers are the ones that can breathe life into the airships of the mind. Robinson has both the airships and the gift to speak them into existence. Check it out, comrades, it is required reading!

parallel futures in mobile application development

15 June 2023 2:02 PM (igalia | compilers | mobile | ionic | capacitor | react | react native | nativescript | flutter | dart | javascript | rust | webassembly | gc | webgpu | webhid | aria | scripting)

Good morning, hackers. Today I'd like to pick up my series on mobile application development. To recap, we looked at:

  • Ionic/Capacitor, which makes mobile app development more like web app development;

  • React Native, a flavor of React that renders to platform-native UI components rather than the Web, with ahead-of-time compilation of JavaScript;

  • NativeScript, which exposes all platform capabilities directly to JavaScript and lets users layer their preferred framework on top;

  • Flutter, which bypasses the platform's native UI components to render directly using the GPU, and uses Dart instead of JavaScript/TypeScript; and

  • Ark, which is Flutter-like in its rendering, but programmed via a dialect of TypeScript, with its own multi-tier compilation and distribution pipeline.

Taking a step back, with the exception of Ark which has a special relationship to HarmonyOS and Huawei, these frameworks are all layers on top of what is provided by Android or iOS. Why would you do that? Presumably there are benefits to these interstitial layers; what are they?

Probably the most basic answer is that an app framework layer offers the promise of abstracting over the different platforms. This way you can just have one mobile application development team instead of two or more. In practice you still need to test on iOS and Android at least, but this is cheaper than having fully separate Android and iOS teams.

Given that we are abstracting over platforms, it is natural also to abandon platform-specific languages like Swift or Kotlin. This is the moment in the strategic planning process that unleashes chaos: there is a fundamental element of randomness and risk when choosing a programming language and its community. Languages exist on a hype and adoption cycle; ideally you want to catch one on its way up, and you want it to remain popular over the life of your platform (10 years or so). This is not an easy thing to do and it's quite possible to bet on the wrong horse. However the communities around popular languages also bring their own risks, in that they have fashions that change over time, and you might have to adapt your platform to the language as fashions come and go, whether or not these fashions actually make better apps.

Choosing JavaScript as your language places more emphasis on the benefits of popularity, and is in turn a promise to adapt to ongoing fads. Choosing a more niche language like Dart places more emphasis on predictability of where the language will go, and ability to shape the language's future; Flutter is a big fish in a small pond.

There are other language choices, though; if you are building your own thing, you can choose any direction you like. What if you used Rust? What if you doubled down on WebAssembly, somehow? In some ways we'll never know unless we go down one of these paths; one has to pick a direction and stick to it for long enough to ship something, and endless tergiversations on such basic questions as language are not helpful. But in the early phases of platform design, all is open, and it would be prudent to spend some time thinking about what it might look like in one of these alternate worlds. In that spirit, let us explore these futures to see how they might be.

alternate world: rust

The arc of history bends away from C and C++ and towards Rust. Given that a mobile development platform has to have some low-level code, there are arguments in favor of writing it in Rust already instead of choosing to migrate in the future.

One advantage of Rust is that programs written in it generally have fewer memory-safety bugs than their C and C++ counterparts, which is important in the context of smart phones that handle untrusted third-party data and programs, i.e., web sites.

Also, Rust makes it easy to write parallel programs. For the same implementation effort, we can expect Rust programs to make more efficient use of the hardware than C++ programs.

And relative to JavaScript et al, Rust also has the advantage of predictable performance: it requires quite a good ahead-of-time compiler, but no adaptive optimization at run-time.

These observations are just conversation-starters, though, and when it comes to imagining what a real mobile device would look like with a Rust application development framework, things get more complicated. Firstly, there is the approach to UI: how do you get pixels on the screen and events from the user? The three general solutions are to use a web browser engine, to use platform-native widgets, or to build everything in Rust using low-level graphics primitives.

The first approach is taken by the Tauri framework: an app is broken into two pieces, a Rust server and an HTML/JS/CSS front-end. Running a Tauri app creates a WebView in which to run the front-end, and establishes a bridge between the web client and the Rust server. In many ways the resulting system ends up looking a lot like Ionic/Capacitor, and many of the UI questions are left open to the user: what UI framework to use, all of the JavaScript programming, and so on.

Instead of using a platform's WebView library, a Rust app could instead ship a WebView. This would of course make the application binary size larger, but tighter coupling between the app and the WebView may allow you to run the UI logic from Rust itself instead of having a large JS component. Notably this would be an interesting opportunity to adopt the Servo web engine, which is itself written in Rust. Servo is a project that in many ways exists in potentia; with more investment it could become a viable alternative to Gecko, Blink, or WebKit, and whoever does the investment would then be in a position of influence in the web platform.

If we look towards the platform-native side, though there are quite a number of Rust libraries that provide wrappers to native widgets, practically all of these primarily target the desktop. Only cacao supports iOS widgets, and there is no equivalent binding for Android, so any NativeScript-like solution in Rust would require a significant amount of work.

In contrast, the ecosystem of Rust UI libraries that are implemented on top of OpenGL and other low-level graphics facilities is much more active and interesting. Probably the best recent overview of this landscape is by Raph Levien, (see the "quick tour of existing architectures" subsection). In summary, everything is still in motion and there is no established consensus as to how to approach the problem of UI development, but there are many interesting experiments in progress. With my engineer hat on, exploring these directions looks like fun. As Raph notes, some degree of exploration seems necessary as well: we will only know if a given approach is a good idea if we spend some time with it.

However if instead we consider the situation from the perspective of someone building a mobile application development framework, Rust seems more of a mid/long-term strategy than a concrete short-term option. Sure, build low-level libraries in Rust, to the extent possible, but there is no compelling-in-and-of-itself story yet that you can sell to potential UI developers, because everything is still so undecided.

Finally, let us consider the question of scripting: sometimes you need to add logic to a program at run-time. It could be because actually most of your app is dynamic and comes from the network; in that case your app is like a little virtual machine. If your app development framework is written in JavaScript, like Ionic/Capacitor, then you have a natural solution: just serve JavaScript. But if your app is written in Rust, what do you do? Waiting until the app store pushes a new version of the app to the user is not an option.

There would appear to be three common solutions to this problem. One is to use JavaScript -- that's what Servo does, for example. As a web engine, Servo doesn't have much of a choice, but the point stands. Currently Servo embeds a copy of SpiderMonkey, the JS engine from Firefox, and it does make sense for Servo to take advantage of an industrial, complete JS engine. Of course, SpiderMonkey is written in C++; if there were a JS engine written in Rust, probably Rust programmers would prefer it. Also it would be fun to write, or rather, fun to start writing; reaching the level of ECMA-262 conformance of SpiderMonkey is at least a hundred-million-dollar project. Anyway what I am saying is that I understand why Boa was started, and I wish them the many millions of dollars needed to see it through to completion.

You are not obliged to script your app via JavaScript, of course; there are many languages out there that have "extending a low-level core" as one of their core use cases. I think the mitigated success that this approach has had over the years—who embeds Python into an iPhone app?—should probably rule out this strategy as a core part of an application development framework. Still, I should mention one Rust-specific option, Rhai; the pitch is that by being Rust-specific, you get more expressive interoperation between Rhai and Rust than you would between Rust and any other dynamic language. Still, it is not a solution that I would bet on: Rhai internalizes so many Rust concepts (notably around borrowing and lifetimes) that I think you have to know Rust to write effective Rhai, and knowing both is quite rare. Anyone who writes Rhai would probably rather be writing Rust, and that's not a good equilibrium.

The third option for scripting Rust is WebAssembly. We'll get to that in a minute.

alternate world: the web of pixels

Let's return to Flutter for a moment, if you will. Like the more active Rust GUI development projects, Flutter is an all-in-one rendering framework based on low-level primitives; all it needs is Vulkan or Metal or (soon) WebGPU, and it handles the rest, layering on opinionated patterns for how to build user interfaces. It didn't arrive to this state in a day, though. To hear Eric Seidel tell the story, Flutter began as a kind of "reset" for the Web, a conscious attempt to determine from the pieces that compose the Web rendering stack, which ones enable smooth user interfaces and which ones get in the way. After taking away all of the parts they didn't need, Flutter wasn't left with much: just GPU texture layers, a low-level drawing toolkit, and the necessary bindings to input events. Of course what the application programmer sees is much more high-level, but underneath, these are the platform primitives that Flutter uses.

So, imagine you work at Google. You used to work on the web—maybe on WebKit and then Chrome like Eric, maybe on web standards—but you broke with this past to see what Flutter might become. Flutter works: great job everybody! The set of graphical and input primitives that you use is minimal enough that it is abstract by nature; it doesn't much matter whether you target iOS or Android, because the primitives will be there. But the web is still the web, and it is annoying, aesthetically speaking. Could we Flutter-ize the web? What would that mean?

That's exactly what former HTML specification editor and now Flutter team member Ian Hixie proposed this January in a brief manifesto, Towards a modern Web stack. The basic idea is that the web and thus the browser is, well, a bit much. Hixie proposed to start over, rebuilding the web on top of WebAssembly (for code), WebGPU (for graphics), WebHID (for input), and ARIA (for accessibility). Technically it's a very interesting proposition! After all, people that build complex web apps end up having to fight with the platform to get the results they want; if we can reorient them to focus on these primitives, perhaps web apps can compete better with native apps.

However if you game out what is being proposed, I have doubts. The existing web is largely HTML, with JavaScript and CSS as add-ons: a web of structured text. Hixie's flutterized web proposal, on the other hand, is a web of pixels. This has a number of implications. One is that each app has to ship its own text renderer and internationalization tables, which is a bit silly to say the least. And whereas we take it for granted that we can mouse over a web page and select its text, with a web of pixels it is much less obvious how that would happen. Hixie's proposal is that apps expose structure via ARIA, but as far as I understand there is no association between pixels and ARIA properties: the pixels themselves really have no built-in structure to speak of.

And of course unlike in the web of structured text, in a web of pixels it would be up each app to actually describe its structure via ARIA: it's not a built-in part of the system. But if you combine this with the rendering story (here's WebGPU, now draw the rest of the owl), Hixie's proposal leaves a void for frameworks to fill between what the app developer wants to write (e.g. Flutter/Dart) and the platform (WebGPU/ARIA/etc).

I said before that I had doubts and indeed I have doubts about my doubts. I am old enough to remember when X11 apps on Unix desktops changed from having fonts rendered on the server (i.e. by the operating system) to having them rendered on the client (i.e. the app), which was associated with a similar kind of anxiety. There were similar factors at play: slow-moving standards (X11) and not knowing at build-time what the platform would actually provide (which X server would be in use, etc). But instead of using the server, you could just ship pixels, and that's how GNOME got good text rendering, with Pango and FreeType and fontconfig, and eventually HarfBuzz, the text shaper used in Chromium and Flutter and many other places. Client-side fonts not only enabled more complex text shaping but also eliminated some round-trips for text measurement during UI layout, which is a bit of a theme in this article series. So could it be that pixels instead of text does not represent an apocalypse for the web? I don't know.

Incidentally I cannot move on from this point without pointing out another narrative thread, which is that of continued human effort over time. Raph Levien, who I mentioned above as a Rust UI toolkit developer, actually spent quite some time doing graphics for GNOME in the early 2000s; I remember working with his libart_lgpl. Behdad Esfahbod, author of HarfBuzz, built many parts of the free software text rendering stack before moving on to Chrome and many other things. I think that if you work on this low level where you are constantly translating text to textures, the accessibility and interaction benefits of using a platform-provided text library start to fade: you are the boss of text around here and you can implement the needed functionality yourself. From this perspective, pixels don't represent risk at all. In the old days of GNOME 2, client-side font rendering didn't lead to bad UI or poor accessibility. To be fair, there were other factors pushing to keep work in a commons, as the actual text rendering libraries still tended to be shipped with the operating system as shared libraries. Would similar factors prevail in a statically-linked web of pixels?

In a way it's a moot question for us, because in this series we are focussing on native app development. So, if you ship a platform, should your app development framework look like the web-of-pixels proposal, or something else? To me it is clear that as a platform, you need more. You need a common development story for how to build user-facing apps: something that looks more like Flutter and less like the primitives that Flutter uses. Though you surely will include a web-of-pixels-like low-level layer, because you need it yourself, probably you should also ship shared text rendering libraries, to reduce the install size for each individual app.

And of course, having text as part of the system has the side benefit of making it easier to get users to install OS-level security patches: it is well-known in the industry that users will make time for the update if they get a new goose emoji in exchange.

alternate world: webassembly

Hark! Have you heard the good word? Have you accepted your Lord and savior, WebAssembly, into your heart? I jest; it does sometime feel like messianic narratives surrounding WebAssembly prevent us from considering its concrete aspects. But despite the hype, WebAssembly is clearly a technology that will be a part of the future of computing. So let's dive in: what would it mean for a mobile app development platform to embrace WebAssembly?

Before answering that question, a brief summary of what WebAssembly is. WebAssembly 1.0 is portable bytecode format that is a good compilation target for C, C++, and Rust. These languages have good compiler toolchains that can produce WebAssembly. The nice thing is that when you instantiate a WebAssembly module, it is completely isolated from its host: it can't harm the host (approximately speaking). All points of interoperation with the host are via copying data into memory owned by the WebAssembly guest; the compiler toolchains abstract over these copies, allowing a Rust-compiled-to-native host to call into a Rust-compiled-to-WebAssembly module using idiomatic Rust code.

So, WebAssembly 1.0 can be used as a way to script a Rust application. The guest script can be interpreted, compiled just in time, or compiled ahead of time for peak throughput.

Of course, people that would want to script an application probably want a higher-level language than Rust. In a way, WebAssembly is in a similar situation as WebGPU in the web-of-pixels proposal: it is a low-level tool that needs higher-level toolchains and patterns to bridge the gap between developers and primitives.

Indeed, the web-of-pixels proposal specifies WebAssembly as the compute primitive. The idea is that you ship your application as a WebAssembly module, and give that module WebGPU, WebHID, and ARIA capabilities via imports. Such a WebAssembly module doesn't script an existing application: it is the app. So another way for an app development platform to use WebAssembly would be like how the web-of-pixels proposes to do it: as an interchange format and as a low-level abstraction. As in the scripting case, you can interpret or compile the module. Perhaps an infrequently-run app would just be interpreted, to save on disk space, whereas a more heavily-used app would be optimized ahead of time, or something.

We should mention another interesting benefit of WebAssembly as a distribution format, which is that it abstracts over the specific chipset on the user's device; it's the device itself that is responsible for efficiently executing the program, possibly via compilation to specialized machine code. I understand for example that RISC-V people are quite happy about this property because it lowers the barrier to entry for them relative to an ARM monoculture.

WebAssembly does have some limitations, though. One is that if the throughput of data transfer between guest and host is high, performance can be bad due to copying overhead. The nascent memory-control proposal aims to provide an mmap capability, but it is still early days. The need to copy would be a limitation for using WebGPU primitives.

More generally, as an abstraction, WebAssembly may not be able to express programs in the most efficient way for a given host platform. For example, its SIMD operations work on 128-bit vectors, whereas host platforms may have much wider vectors. Any current limitation will recede with time, as WebAssembly gains new features, but every year brings new hardware capabilities (tensor operation accelerator, anyone?), so there will be some impedance-matching to do for the foreseeable future.

The more fundamental limitation of the 1.0 version of WebAssembly is that it's only a good compilation target for some languages. This is because some of the fundamental parts of WebAssembly that enable isolation between host and guest (structured control flow, opaque stack, no instruction pointer) make it difficult to efficiently implement languages that need garbage collection, such as Java or Go. The coming WebAssembly 2.0 starts to address this need by including low-level managed arrays and records, allowing for reasonable ahead-of-time compilation of languages like Java. Getting a dynamic language like JavaScript to compile to efficient WebAssembly can still be a challenge, though, because many of the just-in-time techniques needed to efficiently implement these languages will still be missing in WebAssembly 2.0.

Before moving on to WebAssembly as part of an app development framework, one other note: currently WebAssembly modules do not compose very well with each other and with the host, requiring extensive toolchain support to enable e.g. the use of any data type that's not a scalar integer or floating-point value. The component model working group is trying to establish some abstractions and associated tooling, but (again!) it is still early days. Anyone wading into this space needs to be prepared to get their hands dirty.

To return to the question at hand, an app development framework can use WebAssembly for scripting, though the problem of how to compose a host application with a guest script requires good tooling. Or, an app development framework that exposes a web-of-pixels primitive layer can support running WebAssembly apps directly, though again, the set of imports remains to be defined. Either of these two patterns can stick with WebAssembly 1.0 or also allow for garbage collection in WebAssembly 2.0, aiming to capture mindshare among a broader community of potential developers, potentially in a wide range of languages.

As a final observation: WebAssembly is ecumenical, in the sense that it favors no specific church of how to write programs. As a platform, though, you might prefer a state religion, to avoid wasting internal and external efforts on redundant or ill-advised development. After all, if it's your platform, presumably you know best.

summary

What is to be done?

Probably there are as many answers as people, but since this is my blog, here are mine:

  1. On the shortest time-scale I think that it is entirely reasonable to base a mobile application development framework on JavaScript. I would particularly focus on TypeScript, as late error detection is more annoying in native applications.

  2. I would to build something that looks like Flutter underneath: reactive, based on low-level primitives, with a multithreaded rendering pipeline. Perhaps it makes sense to take some inspiration from WebF.

  3. In the medium-term I am sympathetic to Ark's desire to extend the language in a more ResultBuilder-like direction, though this is not without risk.

  4. Also in the medium-term I think that modifications to TypeScript to allow for sound typing could provide some of the advantages of Dart's ahead-of-time compiler to JavaScript developers.

  5. In the long term... well we can do all things with unlimited resources, right? So after solving climate change and homelessness, it makes sense to invest in frameworks that might be usable 3 or 5 years from now. WebAssembly in particular has a chance of sweeping across all platforms, and the primitives for the web-of-pixels will be present everywhere, so if you manage to produce a compelling application development story targetting those primitives, you could eat your competitors' lunch.

Well, friends, that brings this article series to an end; it has been interesting for me to dive into this space, and if you have read down to here, I can only think that you are a masochist or that you have also found it interesting. In either case, you are very welcome. Until next time, happy hacking.

approaching cps soup

20 May 2023 7:10 AM (guile | igalia | spritely | cps soup | cps | compilers | flow analysis)

Good evening, hackers. Today's missive is more of a massive, in the sense that it's another presentation transcript-alike; these things always translate to many vertical pixels.

In my defense, I hardly ever give a presentation twice, so not only do I miss out on the usual per-presentation cost amortization and on the incremental improvements of repetition, the more dire error is that whatever message I might have can only ever reach a subset of those that it might interest; here at least I can be more or less sure that if the presentation would interest someone, that they will find it.

So for the time being I will try to share presentations here, in the spirit of, well, why the hell not.

CPS Soup

A functional intermediate language

10 May 2023 – Spritely

Andy Wingo

Igalia, S.L.

Last week I gave a training talk to Spritely Institute collaborators on the intermediate representation used by Guile's compiler.

CPS Soup

Compiler: Front-end to Middle-end to Back-end

Middle-end spans gap between high-level source code (AST) and low-level machine code

Programs in middle-end expressed in intermediate language

CPS Soup is the language of Guile’s middle-end

An intermediate representation (IR) (or intermediate language, IL) is just another way to express a computer program. Specifically it's the kind of language that is appropriate for the middle-end of a compiler, and by "appropriate" I meant that an IR serves a purpose: there has to be a straightforward transformation to the IR from high-level abstract syntax trees (ASTs) from the front-end, and there has to be a straightforward translation from IR to machine code.

There are also usually a set of necessary source-to-source transformations on IR to "lower" it, meaning to make it closer to the back-end than to the front-end. There are usually a set of optional transformations to the IR to make the program run faster or allocate less memory or be more simple: these are the optimizations.

"CPS soup" is Guile's IR. This talk presents the essentials of CPS soup in the context of more traditional IRs.

How to lower?

High-level:

(+ 1 (if x 42 69))

Low-level:

  cmpi $x, #f
  je L1
  movi $t, 42
  j L2  
L1:
  movi $t, 69
L2:
  addi $t, 1

How to get from here to there?

Before we dive in, consider what we might call the dynamic range of an intermediate representation: we start with what is usually an algebraic formulation of a program and we need to get down to a specific sequence of instructions operating on registers (unlimited in number, at this stage; allocating to a fixed set of registers is a back-end concern), with explicit control flow between them. What kind of a language might be good for this? Let's attempt to answer the question by looking into what the standard solutions are for this problem domain.

1970s

Control-flow graph (CFG)

graph := array<block>
block := tuple<preds, succs, insts>
inst  := goto B
       | if x then BT else BF
       | z = const C
       | z = add x, y
       ...

BB0: if x then BB1 else BB2
BB1: t = const 42; goto BB3
BB2: t = const 69; goto BB3
BB3: t2 = addi t, 1; ret t2

Assignment, not definition

Of course in the early days, there was no intermediate language; compilers translated ASTs directly to machine code. It's been a while since I dove into all this but the milestone I have in my head is that it's the 70s when compiler middle-ends come into their own right, with Fran Allen's work on flow analysis and optimization.

In those days the intermediate representation for a compiler was a graph of basic blocks, but unlike today the paradigm was assignment to locations rather than definition of values. By that I mean that in our example program, we get t assigned to in two places (BB1 and BB2); the actual definition of t is implicit, as a storage location, and our graph consists of assignments to the set of storage locations in the program.

1980s

Static single assignment (SSA) CFG

graph := array<block>
block := tuple<preds, succs, phis, insts>
phi   := z := φ(x, y, ...)
inst  := z := const C
       | z := add x, y
       ...
BB0: if x then BB1 else BB2
BB1: v0 := const 42; goto BB3
BB2: v1 := const 69; goto BB3
BB3: v2 := φ(v0,v1); v3:=addi t,1; ret v3

Phi is phony function: v2 is v0 if coming from first predecessor, or v1 from second predecessor

These days we still live in Fran Allen's world, but with a twist: we no longer model programs as graphs of assignments, but rather graphs of definitions. The introduction in the mid-80s of so-called "static single-assignment" (SSA) form graphs mean that instead of having two assignments to t, we would define two different values v0 and v1. Then later instead of reading the value of the storage location associated with t, we define v2 to be either v0 or v1: the former if we reach the use of t in BB3 from BB1, the latter if we are coming from BB2.

If you think on the machine level, in terms of what the resulting machine code will be, this either function isn't a real operation; probably register allocation will put v0, v1, and v2 in the same place, say $rax. The function linking the definition of v2 to the inputs v0 and v1 is purely notational; in a way, you could say that it is phony, or not real. But when the creators of SSA went to submit this notation for publication they knew that they would need something that sounded more rigorous than "phony function", so they instead called it a "phi" (φ) function. Really.

2003: MLton

Refinement: phi variables are basic block args

graph := array<block>
block := tuple<preds, succs, args, insts>

Inputs of phis implicitly computed from preds

BB0(a0): if a0 then BB1() else BB2()
BB1(): v0 := const 42; BB3(v0)
BB2(): v1 := const 69; BB3(v1)
BB3(v2): v3 := addi v2, 1; ret v3

SSA is still where it's at, as a conventional solution to the IR problem. There have been some refinements, though. I learned of one of them from MLton; I don't know if they were first but they had the idea of interpreting phi variables as arguments to basic blocks. In this formulation, you don't have explicit phi instructions; rather the "v2 is either v1 or v0" property is expressed by v2 being a parameter of a block which is "called" with either v0 or v1 as an argument. It's the same semantics, but an interesting notational change.

Refinement: Control tail

Often nice to know how a block ends (e.g. to compute phi input vars)

graph := array<block>
block := tuple<preds, succs, args, insts,
               control>
control := if v then L1 else L2
         | L(v, ...)
         | switch(v, L1, L2, ...)
         | ret v

One other refinement to SSA is to note that basic blocks consist of some number of instructions that can define values or have side effects but which otherwise exhibit fall-through control flow, followed by a single instruction that transfers control to another block. We might as well store that control instruction separately; this would let us easily know how a block ends, and in the case of phi block arguments, easily say what values are the inputs of a phi variable. So let's do that.

Refinement: DRY

Block successors directly computable from control

Predecessors graph is inverse of successors graph

graph := array<block>
block := tuple<args, insts, control>

Can we simplify further?

At this point we notice that we are repeating ourselves; the successors of a block can be computed directly from the block's terminal control instruction. Let's drop those as a distinct part of a block, because when you transform a program it's unpleasant to have to needlessly update something in two places.

While we're doing that, we note that the predecessors array is also redundant, as it can be computed from the graph of block successors. Here we start to wonder: am I simpliying or am I removing something that is fundamental to the algorithmic complexity of the various graph transformations that I need to do? We press on, though, hoping we will get somewhere interesting.

Basic blocks are annoying

Ceremony about managing insts; array or doubly-linked list?

Nonuniformity: “local” vs ‘`global’' transformations

Optimizations transform graph A to graph B; mutability complicates this task

  • Desire to keep A in mind while making B
  • Bugs because of spooky action at a distance

Recall that the context for this meander is Guile's compiler, which is written in Scheme. Scheme doesn't have expandable arrays built-in. You can build them, of course, but it is annoying. Also, in Scheme-land, functions with side-effects are conventionally suffixed with an exclamation mark; after too many of them, both the writer and the reader get fatigued. I know it's a silly argument but it's one of the things that made me grumpy about basic blocks.

If you permit me to continue with this introspection, I find there is an uneasy relationship between instructions and locations in an IR that is structured around basic blocks. Do instructions live in a function-level array and a basic block is an array of instruction indices? How do you get from instruction to basic block? How would you hoist an instruction to another basic block, might you need to reallocate the block itself?

And when you go to transform a graph of blocks... well how do you do that? Is it in-place? That would be efficient; but what if you need to refer to the original program during the transformation? Might you risk reading a stale graph?

It seems to me that there are too many concepts, that in the same way that SSA itself moved away from assignment to a more declarative language, that perhaps there is something else here that might be more appropriate to the task of a middle-end.

Basic blocks, phi vars redundant

Blocks: label with args sufficient; “containing” multiple instructions is superfluous

Unify the two ways of naming values: every var is a phi

graph := array<block>
block := tuple<args, inst>
inst  := L(expr)
       | if v then L1() else L2()
       ...
expr  := const C
       | add x, y
       ...

I took a number of tacks here, but the one I ended up on was to declare that basic blocks themselves are redundant. Instead of containing an array of instructions with fallthrough control-flow, why not just make every instruction a control instruction? (Yes, there are arguments against this, but do come along for the ride, we get to a funny place.)

While you are doing that, you might as well unify the two ways in which values are named in a MLton-style compiler: instead of distinguishing between basic block arguments and values defined within a basic block, we might as well make all names into basic block arguments.

Arrays annoying

Array of blocks implicitly associates a label with each block

Optimizations add and remove blocks; annoying to have dead array entries

Keep labels as small integers, but use a map instead of an array

graph := map<label, block>

In the traditional SSA CFG IR, a graph transformation would often not touch the structure of the graph of blocks. But now having given each instruction its own basic block, we find that transformations of the program necessarily change the graph. Consider an instruction that we elide; before, we would just remove it from its basic block, or replace it with a no-op. Now, we have to find its predecessor(s), and forward them to the instruction's successor. It would be useful to have a more capable data structure to represent this graph. We might as well keep labels as being small integers, but allow for sparse maps and growth by using an integer-specialized map instead of an array.

This is CPS soup

graph := map<label, cont>
cont  := tuple<args, term>
term  := continue to L
           with values from expr
       | if v then L1() else L2()
       ...
expr  := const C
       | add x, y
       ...

SSA is CPS

This is exactly what CPS soup is! We came at it "from below", so to speak; instead of the heady fumes of the lambda calculus, we get here from down-to-earth basic blocks. (If you prefer the other way around, you might enjoy this article from a long time ago.) The remainder of this presentation goes deeper into what it is like to work with CPS soup in practice.

Scope and dominators

BB0(a0): if a0 then BB1() else BB2()
BB1(): v0 := const 42; BB3(v0)
BB2(): v1 := const 69; BB3(v1)
BB3(v2): v3 := addi v2, 1; ret v3

What vars are “in scope” at BB3? a0 and v2.

Not v0; not all paths from BB0 to BB3 define v0.

a0 always defined: its definition dominates all uses.

BB0 dominates BB3: All paths to BB3 go through BB0.

Before moving on, though, we should discuss what it means in an SSA-style IR that variables are defined rather than assigned. If you consider variables as locations to which values can be assigned and which initially hold garbage, you can read them at any point in your program. You might get garbage, though, if the variable wasn't assigned something sensible on the path that led to reading the location's value. It sounds bonkers but it is still the C and C++ semantic model.

If we switch instead to a definition-oriented IR, then a variable never has garbage; the single definition always precedes any uses of the variable. That is to say that all paths from the function entry to the use of a variable must pass through the variable's definition, or, in the jargon, that definitions dominate uses. This is an invariant of an SSA-style IR, that all variable uses be dominated by their associated definition.

You can flip the question around to ask what variables are available for use at a given program point, which might be read equivalently as which variables are in scope; the answer is, all definitions from all program points that dominate the use site. The "CPS" in "CPS soup" stands for continuation-passing style, a dialect of the lambda calculus, which has also has a history of use as a compiler intermediate representation. But it turns out that if we use the lambda calculus in its conventional form, we end up needing to maintain a lexical scope nesting at the same time that we maintain the control-flow graph, and the lexical scope tree can fail to reflect the dominator tree. I go into this topic in more detail in an old article, and if it interests you, please do go deep.

CPS soup in Guile

Compilation unit is intmap of label to cont

cont := $kargs names vars term
      | ...
term := $continue k src expr
      | ...
expr := $const C
      | $primcall ’add #f (a b)
      | ...

Conventionally, entry point is lowest-numbered label

Anyway! In Guile, the concrete form that CPS soup takes is that a program is an intmap of label to cont. A cont is the smallest labellable unit of code. You can call them blocks if that makes you feel better. One kind of cont, $kargs, binds incoming values to variables. It has a list of variables, vars, and also has an associated list of human-readable names, names, for debugging purposes.

A $kargs contains a term, which is like a control instruction. One kind of term is $continue, which passes control to a continuation k. Using our earlier language, this is just goto *k*, with values, as in MLton. (The src is a source location for the term.) The values come from the term's expr, of which there are a dozen kinds or so, for example $const which passes a literal constant, or $primcall, which invokes some kind of primitive operation, which above is add. The primcall may have an immediate operand, in this case #f, and some variables that it uses, in this case a and b. The number and type of the produced values is a property of the primcall; some are just for effect, some produce one value, some more.

CPS soup

term := $continue k src expr
      | $branch kf kt src op param args
      | $switch kf kt* src arg
      | $prompt k kh src escape? tag
      | $throw src op param args

Expressions can have effects, produce values

expr := $const val
      | $primcall name param args
      | $values args
      | $call proc args
      | ...

There are other kinds of terms besides $continue: there is $branch, which proceeds either to the false continuation kf or the true continuation kt depending on the result of performing op on the variables args, with immediate operand param. In our running example, we might have made the initial term via:

(build-term
  ($branch BB1 BB2 'false? #f (a0)))

The definition of build-term (and build-cont and build-exp) is in the (language cps) module.

There is also $switch, which takes an unboxed unsigned integer arg and performs an array dispatch to the continuations in the list kt, or kf otherwise.

There is $prompt which continues to its k, having pushed on a new continuation delimiter associated with the var tag; if code aborts to tag before the prompt exits via an unwind primcall, the stack will be unwound and control passed to the handler continuation kh. If escape? is true, the continuation is escape-only and aborting to the prompt doesn't need to capture the suspended continuation.

Finally there is $throw, which doesn't continue at all, because it causes a non-resumable exception to be thrown. And that's it; it's just a handful of kinds of term, determined by the different shapes of control-flow (how many continuations the term has).

When it comes to values, we have about a dozen expression kinds. We saw $const and $primcall, but I want to explicitly mention $values, which simply passes on some number of values. Often a $values expression corresponds to passing an input to a phi variable, though $kargs vars can get their definitions from any expression that produces the right number of values.

Kinds of continuations

Guile functions untyped, can multiple return values

Error if too few values, possibly truncate too many values, possibly cons as rest arg...

Calling convention: contract between val producer & consumer

  • both on call and return side

Continuation of $call unlike that of $const

When a $continue term continues to a $kargs with a $const 42 expression, there are a number of invariants that the compiler can ensure: that the $kargs continuation is always passed the expected number of values, that the vars that it binds can be allocated to specific locations (e.g. registers), and that because all predecessors of the $kargs are known, that those predecessors can place their values directly into the variable's storage locations. Effectively, the compiler determines a custom calling convention between each $kargs and its predecessors.

Consider the $call expression, though; in general you don't know what the callee will do to produce its values. You don't even generally know that it will produce the right number of values. Therefore $call can't (in general) continue to $kargs; instead it continues to $kreceive, which expects the return values in well-known places. $kreceive will check that it is getting the right number of values and then continue to a $kargs, shuffling those values into place. A standard calling convention defines how functions return values to callers.

The conts

cont := $kfun src meta self ktail kentry
      | $kclause arity kbody kalternate
      | $kargs names syms term
      | $kreceive arity kbody
      | $ktail

$kclause, $kreceive very similar

Continue to $ktail: return

$call and return (and $throw, $prompt) exit first-order flow graph

Of course, a $call expression could be a tail-call, in which case it would continue instead to $ktail, indicating an exit from the first-order function-local control-flow graph.

The calling convention also specifies how to pass arguments to callees, and likewise those continuations have a fixed calling convention; in Guile we start functions with $kfun, which has some metadata attached, and then proceed to $kclause which bridges the boundary between the standard calling convention and the specialized graph of $kargs continuations. (Many details of this could be tweaked, for example that the case-lambda dispatch built-in to $kclause could instead dispatch to distinct functions instead of to different places in the same function; historical accidents abound.)

As a detail, if a function is well-known, in that all its callers are known, then we can lighten the calling convention, moving the argument-count check to callees. In that case $kfun continues directly to $kargs. Similarly for return values, optimizations can make $call continue to $kargs, though there is still some value-shuffling to do.

High and low

CPS bridges AST (Tree-IL) and target code

High-level: vars in outer functions in scope

Closure conversion between high and low

Low-level: Explicit closure representations; access free vars through closure

CPS soup is the bridge between parsed Scheme and machine code. It starts out quite high-level, notably allowing for nested scope, in which expressions can directly refer to free variables. Variables are small integers, and for high-level CPS, variable indices have to be unique across all functions in a program. CPS gets lowered via closure conversion, which chooses specific representations for each closure that remains after optimization. After closure conversion, all variable access is local to the function; free variables are accessed via explicit loads from a function's closure.

Optimizations at all levels

Optimizations before and after lowering

Some exprs only present in one level

Some high-level optimizations can merge functions (higher-order to first-order)

Because of the broad remit of CPS, the language itself has two dialects, high and low. The high level dialect has cross-function variable references, first-class abstract functions (whose representation hasn't been chosen), and recursive function binding. The low-level dialect has only specific ways to refer to functions: labels and specific closure representations. It also includes calls to function labels instead of just function values. But these are minor variations; some optimization and transformation passes can work on either dialect.

Practicalities

Intmap, intset: Clojure-style persistent functional data structures

Program: intmap<label,cont>

Optimization: program→program

Identify functions: (program,label)→intset<label>

Edges: intmap<label,intset<label>>

Compute succs: (program,label)→edges

Compute preds: edges→edges

I mentioned that programs were intmaps, and specifically in Guile they are Clojure/Bagwell-style persistent functional data structures. By functional I mean that intmaps (and intsets) are values that can't be mutated in place (though we do have the transient optimization).

I find that immutability has the effect of deploying a sense of calm to the compiler hacker -- I don't need to worry about data structures changing out from under me; instead I just structure all the transformations that you need to do as functions. An optimization is just a function that takes an intmap and produces another intmap. An analysis associating some data with each program label is just a function that computes an intmap, given a program; that analysis will never be invalidated by subsequent transformations, because the program to which it applies will never be mutated.

This pervasive feeling of calm allows me to tackle problems that I wouldn't have otherwise been able to fit into my head. One example is the novel online CSE pass; one day I'll either wrap that up as a paper or just capitulate and blog it instead.

Flow analysis

A[k] = meet(A[p] for p in preds[k])
         - kill[k] + gen[k]

Compute available values at labels:

  • A: intmap<label,intset<val>>
  • meet: intmap-intersect<intset-intersect>
  • -, +: intset-subtract, intset-union
  • kill[k]: values invalidated by cont because of side effects
  • gen[k]: values defined at k

But to keep it concrete, let's take the example of flow analysis. For example, you might want to compute "available values" at a given label: these are the values that are candidates for common subexpression elimination. For example if a term is dominated by a car x primcall whose value is bound to v, and there is no path from the definition of V to a subsequent car x primcall, we can replace that second duplicate operation with $values (v) instead.

There is a standard solution for this problem, which is to solve the flow equation above. I wrote about this at length ages ago, but looking back on it, the thing that pleases me is how easy it is to decompose the task of flow analysis into manageable parts, and how the types tell you exactly what you need to do. It's easy to compute an initial analysis A, easy to define your meet function when your maps and sets have built-in intersect and union operators, easy to define what addition and subtraction mean over sets, and so on.

Persistent data structures FTW

  • meet: intmap-intersect<intset-intersect>
  • -, +: intset-subtract, intset-union

Naïve: O(nconts * nvals)

Structure-sharing: O(nconts * log(nvals))

Computing an analysis isn't free, but it is manageable in cost: the structure-sharing means that meet is usually trivial (for fallthrough control flow) and the cost of + and - is proportional to the log of the problem size.

CPS soup: strengths

Relatively uniform, orthogonal

Facilitates functional transformations and analyses, lowering mental load: “I just have to write a function from foo to bar; I can do that”

Encourages global optimizations

Some kinds of bugs prevented by construction (unintended shared mutable state)

We get the SSA optimization literature

Well, we're getting to the end here, and I want to take a step back. Guile has used CPS soup as its middle-end IR for about 8 years now, enough time to appreciate its fine points while also understanding its weaknesses.

On the plus side, it has what to me is a kind of low cognitive overhead, and I say that not just because I came up with it: Guile's development team is small and not particularly well-resourced, and we can't afford complicated things. The simplicity of CPS soup works well for our development process (flawed though that process may be!).

I also like how by having every variable be potentially a phi, that any optimization that we implement will be global (i.e. not local to a basic block) by default.

Perhaps best of all, we get these benefits while also being able to use the existing SSA transformation literature. Because CPS is SSA, the lessons learned in SSA (e.g. loop peeling) apply directly.

CPS soup: weaknesses

Pointer-chasing, indirection through intmaps

Heavier than basic blocks: more control-flow edges

Names bound at continuation only; phi predecessors share a name

Over-linearizes control, relative to sea-of-nodes

Overhead of re-computation of analyses

CPS soup is not without its drawbacks, though. It's not suitable for JIT compilers, because it imposes some significant constant-factor (and sometimes algorithmic) overheads. You are always indirecting through intmaps and intsets, and these data structures involve significant pointer-chasing.

Also, there are some forms of lightweight flow analysis that can be performed naturally on a graph of basic blocks without looking too much at the contents of the blocks; for example in our available variables analysis you could run it over blocks instead of individual instructions. In these cases, basic blocks themselves are an optimization, as they can reduce the size of the problem space, with corresponding reductions in time and memory use for analyses and transformations. Of course you could overlay a basic block graph on top of CPS soup, but it's not a well-worn path.

There is a little detail that not all phi predecessor values have names, since names are bound at successors (continuations). But this is a detail; if these names are important, little $values trampolines can be inserted.

Probably the main drawback as an IR is that the graph of conts in CPS soup over-linearizes the program. There are other intermediate representations that don't encode ordering constraints where there are none; perhaps it would be useful to marry CPS soup with sea-of-nodes, at least during some transformations.

Finally, CPS soup does not encourage a style of programming where an analysis is incrementally kept up to date as a program is transformed in small ways. The result is that we end up performing much redundant computation within each individual optimization pass.

Recap

CPS soup is SSA, distilled

Labels and vars are small integers

Programs map labels to conts

Conts are the smallest labellable unit of code

Conts can have terms that continue to other conts

Compilation simplifies and lowers programs

Wasm vs VM backend: a question for another day :)

But all in all, CPS soup has been good for Guile. It's just SSA by another name, in a simpler form, with a functional flavor. Or, it's just CPS, but first-order only, without lambda.

In the near future, I am interested in seeing what a new GC will do for CPS soup; will bump-pointer allocation palliate some of the costs of pointer-chasing? We'll see. A tricky thing about CPS soup is that I don't think that anyone else has tried it in other languages, so it's hard to objectively understand its characteristics independent of Guile itself.

Finally, it would be nice to engage in the academic conversation by publishing a paper somewhere; I would like to see interesting criticism, and blog posts don't really participate in the citation graph. But in the limited time available to me, faced with the choice between hacking on something and writing a paper, it's always been hacking, so far :)

Speaking of limited time, I probably need to hit publish on this one and move on. Happy hacking to all, and until next time.

structure and interpretation of ark

2 May 2023 9:13 AM (igalia | compilers | mobile | ionic | capacitor | react | react native | nativescript | flutter | dart | javascript)

Hello, dear readers! Today's article describes Ark, a new JavaScript-based mobile development platform. If you haven't read them yet, you might want to start by having a look at my past articles on Capacitor, React Native, NativeScript, and Flutter; having a common understanding of the design space will help us understand where Ark is similar and where it differs.

Ark, what it is

If I had to bet, I would guess that you have not heard of Ark. (I certainly hadn't either, when commissioned to do this research series.) To a first approximation, Ark—or rather, what I am calling Ark; I don't actually know the name for the whole architecture—is a loosely Flutter-like UI library implemented on top of a dialect of JavaScript, with build-time compilation to bytecode (like Hermes) but also with support for just-in-time and ahead-of-time compilation of bytecode to native code. It is made by Huawei.

At this point if you are already interested in this research series, I am sure this description raises more questions than it answers. Flutter-like? A dialect? Native compilation? Targetting what platforms? From Huawei? We'll get to all of these, but I think we need to start with the last question.

How did we get here?

In my last article on Flutter, I told a kind of just-so history of how Dart and Flutter came to their point in the design space. Thanks to corrections from a kind reader, it happened to also be more or less correct. In this article, though, I haven't talked with Ark developers at all; I don't have the benefit of a true claim on history. And yet, the only way I can understand Ark is by inventing a narrative, so here we go. It might even be true!

Recall that in 2018, Huawei was a dominant presence in the smartphone market. They were shipping excellent hardware at good prices both to the Chinese and to the global markets. Like most non-Apple, non-Google manufacturers, they shipped Android, and like most Android OEMs, they shipped Google's proprietary apps (mail, maps, etc.).

But then, over the next couple years, the US decided that allowing Huawei to continue on as before was, like, against national security interests or something. Huawei was barred from American markets, a number of suppliers were forbidden from selling hardware components to Huawei, and even Google was prohibited from shipping its mobile apps on Huawei devices. The effect on Huawei's market share for mobile devices was enormous: its revenue was cut in half over a period of a couple years.

In this position, as Huawei, what do you do? I can't even imagine, but specifically looking at smartphones, I think I would probably do about what they did. I'd fork Android, for starters, because that's what you already know and ship, and Android is mostly open source. I'd probably plan on continuing to use its lower-level operating system pieces indefinitely (kernel and so on) because that's not a value differentiator. I'd probably ship the same apps on top at first, because otherwise you slip all the release schedules and lose revenue entirely.

But, gosh, there is the risk that your product will be perceived as just a worse version of Android: that's not a good position to be in. You need to be different, and ideally better. That will take time. In the meantime, you claim that you're different, without actually being different yet. It's a somewhat ridiculous position to be in, but I can understand how you get here; Ars Technica published a scathing review poking fun at the situation. But, you are big enough to ride it out, knowing that somehow eventually you will be different.

Up to now, this part of the story is relatively well-known; the part that follows is more speculative on my part. Firstly, I would note that Huawei had been working for a while on a compiler and language run-time called Ark Compiler, with the goal of getting better performance out of Android applications. If I understand correctly, this compiler took the Java / Dalvik / Android Run Time bytecodes as its input, and outputted native binaries along with a new run-time implementation.

As I can attest from personal experience, having a compiler leads to hubris: you start to consider source languages like a hungry person looks at a restaurant menu. "Wouldn't it be nice to ingest that?" That's what we say at restaurants, right, fellow humans? So in 2019 and 2020 when the Android rug was pulled out from underneath Huawei, I think having in-house compiler expertise allowed them to consider whether they wanted to stick with Java at all, or whether it might be better to choose a more fashionable language.

Like black, JavaScript is always in fashion. What would it mean, then, to retool Huawei's operating system -- by then known by the name "HarmonyOS" -- to expose a JavaScript-based API as its primary app development framework? You could use your Ark compiler somehow to implement JavaScript (hubris!) and then you need a UI framework. Having ditched Java, it is now thinkable to ditch all the other Android standard libraries, including the UI toolkit: you start anew, in a way. So are you going to build a Capacitor, a React Native, a NativeScript, a Flutter? Surely not precisely any of these, but what will it be like, and how will it differ?

Incidentally, I don't know the origin story for the name Ark, but to me it brings to mind tragedy and rebuilding: in the midst of being cut off from your rich Android ecosystem, you launch a boat into the sea, holding a promise of a new future built differently. Hope and hubris in one vessel.

Two programming interfaces

In the end, Huawei builds two things: something web-like and something like Flutter. (I don't mean to suggest copying or degeneracy here; it's rather that I can only understand things in relation to other things, and these are my closest points of comparison for what they built.)

The web-like programming interface specifies UIs using an XML dialect, HML, and styles the resulting node tree with CSS. You augment these nodes with JavaScript behavior; the main app is a set of DOM-like event handlers. There is an API to dynamically create DOM nodes, but unlike the other systems we have examined, the HarmonyOS documentation doesn't really sell you on using a high-level framework like Angular.

If this were it, I think Ark would not be so compelling: the programming model is more like what was available back in the DHTML days. I wouldn't expect people to be able to make rich applications that delight users, given these primitives, though CSS animation and the HML loop and conditional rendering from the template system might be just expressive enough for simple applications.

The more interesting side is the so-called "declarative" UI programming model which exposes a Flutter/React-like interface. The programmer describes the "what" of the UI by providing a tree of UI nodes in its build function, and the framework takes care of calling build when necessary and of rendering that tree to the screen.

Here I need to show some example code, because it is... weird. Well, I find it weird, but it's not too far from SwiftUI in flavor. A small example from the fine manual:

@Entry
@Component
struct MyComponent {
  build() {
    Stack() {
        Image($rawfile('Tomato.png'))
        Text('Tomato')
            .fontSize(26)
            .fontWeight(500)
    }
  }
}

The @Entry decorator (*) marks this struct (**) as being the main entry point for the app. @Component marks it as being a component, like a React functional component. Components conform to an interface (***) which defines them as having a build method which takes no arguments and returns no values: it creates the tree in a somewhat imperative way.

But as you see the flavor is somewhat declarative, so how does that work? Also, build() { ... } looks syntactically a lot like Stack() { ... }; what's the deal, are they the same?

Before going on to answer this, note my asterisks above: these are concepts that aren't in JavaScript. Indeed, programs written for HarmonyOS's declarative framework aren't JavaScript; they are in a dialect of TypeScript that Huawei calls ArkTS. In this case, an interface is a TypeScript concept. Decorators would appear to correspond to an experimental TypeScript feature, looking at the source code.

But struct is an ArkTS-specific extension, and Huawei has actually extended the TypeScript compiler to specifically recognize the @Component decorator, such that when you "call" a struct, for example as above in Stack() { ... }, TypeScript will parse that as a new expression type EtsComponentExpression, which may optionally be followed by a block. When Stack() is invoked, its children (instances of Image and Text, in this case) will be populated via running the block.

Now, though TypeScript isn't everyone's bag, it's quite normalized in the JavaScript community and not a hard sell. Language extensions like the handling of @Component pose a more challenging problem. Still, Facebook managed to sell people on JSX, so perhaps Huawei can do the same for their dialect. More on that later.

Under the hood, it would seem that we have a similar architecture to Flutter: invoking the components creates a corresponding tree of elements (as with React Native's shadow tree), which then are lowered to render nodes, which draw themselves onto layers using Skia, in a multi-threaded rendering pipeline. Underneath, the UI code actually re-uses some parts of Flutter, though from what I can tell HarmonyOS developers are replacing those over time.

Restrictions and extensions

So we see that the source language for the declarative UI framework is TypeScript, but with some extensions. It also has its restrictions, and to explain these, we have to talk about implementation.

Of the JavaScript mobile application development frameworks we discussed, Capacitor and NativeScript used "normal" JS engines from web browsers, while React Native built their own Hermes implementation. Hermes is also restricted, in a way, but mostly inasmuch as it lags the browser JS implementations; it relies on source-to-source transpilers to get access to new language features. ArkTS—that's the name of HarmonyOS's "extended TypeScript" implementation—has more fundamental restrictions.

Recall that the Ark compiler was originally built for Android apps. There you don't really have the ability to load new Java or Kotlin source code at run-time; in Java you have class loaders, but those load bytecode. On an Android device, you don't have to deal with the Java source language. If we use a similar architecture for JavaScript, though, what do we do about eval?

ArkTS's answer is: don't. As in, eval is not supported on HarmonyOS. In this way the implementation of ArkTS can be divided into two parts, a frontend that produces bytecode and a runtime that runs the bytecode, and you never have to deal with the source language on the device where the runtime is running. Like Hermes, the developer produces bytecode when building the application and ships it to the device for the runtime to handle.

Incidentally, before we move on to discuss the runtime, there are actually two front-ends that generate ArkTS bytecode: one written in C++ that seems to only handle standard TypeScript and JavaScript, and one written in TypeScript that also handles "extended TypeScript". The former has a test262 runner with about 10k skipped tests, and the latter doesn't appear to have a test262 runner. Note, I haven't actually built either one of these (or any of the other frameworks, for that matter).

The ArkTS runtime is itself built on a non-language-specific common Ark runtime, and the set of supported instructions is the union of the core ISA and the JavaScript-specific instructions. Bytecode can be interpreted, JIT-compiled, or AOT-compiled.

On the side of design documentation, it's somewhat sparse. There are some core design docs; readers may be interested in the rationale to use a bytecode interface for Ark as a whole, or the optimization overview.

Indeed ArkTS as a whole has a surfeit of optimizations, to an extent that makes me wonder which ones are actually needed. There are source-to-source optimizations on bytecode, which I expect are useful if you are generating ArkTS bytecode from JavaScript, where you probably don't have a full compiler implementation. There is a completely separate optimizer in the eTS part of the run-time, based on what would appear to be a novel "circuit-based" IR that bears some similarity to sea-of-nodes. Finally the whole thing appears to bottom out in LLVM, which of course has its own optimizer. I can only assume that this situation is somewhat transitory. Also, ArkTS does appear to generate its own native code sometimes, notably for inline cache stubs.

Of course, when it comes to JavaScript, there are some fundamental language semantics and there is also a large and growing standard library. In the case of ArkTS, this standard library is part of the run-time, like the interpreter, compilers, and the garbage collector (generational concurrent mark-sweep with optional compaction).

All in all, when I step back from it, it's a huge undertaking. Implementing JavaScript is no joke. It appears that ArkTS has done the first 90% though; the proverbial second 90% should only take a few more years :)

Evaluation

If you told a younger me that a major smartphone vendor switched from Java to JavaScript for their UI, you would probably hear me react in terms of the relative virtues of the programming languages in question. At this point in my career, though, the only thing that comes to mind is what an expensive proposition it is to change everything about an application development framework. 200 people over 5 years would be my estimate, though of course teams are variable. So what is it that we can imagine that Huawei bought with a thousand person-years of investment? Towards what other local maximum are we heading?

Startup latency

I didn't mention it before, but it would seem that one of the goals of HarmonyOS is in the name: Huawei wants to harmonize development across the different range of deployment targets. To the extent possible, it would be nice to be able to write the same kinds of programs for IoT devices as you do for feature-rich smartphones and tablets and the like. In that regard one can see through all the source code how there is a culture of doing work ahead-of-time and preventing work at run-time; for example see the design doc for the interpreter, or for the file format, or indeed the lack of JavaScript eval.

Of course, this wide range of targets also means that the HarmonyOS platform bears the burden of a high degree of abstraction; not only can you change the kernel, but also the JavaScript engine (using JerryScript on "lite" targets).

I mention this background because sometimes in news articles and indeed official communication from recent years there would seem to be some confusion that HarmonyOS is just for IoT, or aimed to be super-small, or something. In this evaluation I am mostly focussed on the feature-rich side of things, and there my understanding is that the developer will generate bytecode ahead-of-time. When an app is installed on-device, the AOT compiler will turn it into a single ELF image. This should generally lead to fast start-up.

However it would seem that the rendering library that paints UI nodes into layers and then composits those layers uses Skia in the way that Flutter did pre-Impeller, which to be fair is a quite recent change to Flutter. I expect therefore that Ark (ArkTS + ArkUI) applications also experience shader compilation jank at startup, and that they may be well-served by tesellating their shapes into primitives like Impeller does so that they can precompile a fixed, smaller set of shaders.

Jank

Maybe it's just that apparently I think Flutter is great, but ArkUI's fundamental architectural similarity to Flutter makes me think that jank will not be a big issue. There is a render thread that is separate from the ArkTS thread, so like with Flutter, async communication with main-thread interfaces is the main jank worry. And on the ArkTS side, ArkTS even has a number of extensions to be able to share objects between threads without copying, should that be needed. I am not sure how well-developed and well-supported these extensions are, though.

I am hedging my words, of course, because I am missing a bit of social proof; HarmonyOS is still in infant days, and it doesn't have much in the way of a user base outside China, from what I can tell, and my ability to read Chinese is limited to what Google Translate can do for me :) Unlike other frameworks, therefore, I haven't been as able to catch a feel of the pulse of the ArkUI user community: what people are happy about, what the pain points are.

It's also interesting that unlike iOS or Android, HarmonyOS is only exposing these "web-like" and "declarative" UI frameworks for app development. This makes it so that the same organization is responsible for the software from top to bottom, which can lead to interesting cross-cutting optimizations: functional reactive programming isn't just a developer-experience narrative, but it can directly affect the shape of the rendering pipeline. If there is jank, someone in the building is responsible for it and should be able to fix it, whether it is in the GPU driver, the kernel, the ArkTS compiler, or the application itself.

Peak performance

I don't know how to evaluate ArkTS for peak performance. Although there is a JIT compiler, I don't have the feeling that it is as tuned for adaptive optimization as V8 is.

At the same time, I find it interesting that HarmonyOS has chosen to modify JavaScript. While it is doing that, could they switch to a sound type system, to allow the kinds of AOT optimizations that Dart can do? It would be an interesting experiment.

As it is, though, if I had to guess, I would say that ArkTS is well-positioned for predictably good performance with AOT compilation, although I would be interested in seeing the results of actually running it.

Aside: On the importance of storytelling

In this series I have tried to be charitable towards the frameworks that I review, to give credit to what they are trying to do, even while noting where they aren't currently there yet. That's part of why I need a plausible narrative for how the frameworks got where they are, because that lets me have an idea of where they are going.

In that sense I think that Ark is at an interesting inflection point. When I started reading documentation about ArkUI and HarmonyOS and all that, I bounced out—there were too many architectural box diagrams, too many generic descriptions of components, too many promises with buzzwords. It felt to me like the project was trying to justify itself to a kind of clueless management chain. Was there actually anything here?

But now when I see the adoption of a modern rendering architecture and a bold new implementation of JavaScript, along with the willingness to experiment with the language, I think that there is an interesting story to be told, but this time not to management but to app developers.

Of course you wouldn't want to market to app developers when your system is still a mess because you haven't finished rebuilding an MVP yet. Retaking my charitable approach, then, I can only think that all the architectural box diagrams were a clever blind to avoid piquing outside interest while the app development kit wasn't ready yet :) As and when the system starts working well, presumably over the next year or so, I would expect HarmonyOS to invest much more heavily in marketing and developer advocacy; the story is interesting, but you have to actually tell it.

Aside: O platform, my platform

All of the previous app development frameworks that we looked at were cross-platform; Ark is not. It could be, of course: it does appear to be thoroughly open source. But HarmonyOS devices are the main target. What implications does this have?

A similar question arises in perhaps a more concrete way if we start with the mature Flutter framework: what would it mean to make a Flutter phone?

The first thought that comes to mind is that having a Flutter OS would allow for the potential for more cross-cutting optimizations that cross abstraction layers. But then I think, what does Flutter really need? It has the GPU drivers, and we aren't going to re-implement those. It has the bridge to the platform-native SDK, which is not such a large and important part of the app. You get input from the platform, but that's also not so specific. So maybe optimization is not the answer.

On the other hand, a Flutter OS would not have to solve the make-it-look-native problem; because there would be no other "native" toolkit, your apps won't look out of place. That's nice. It's not something that would make the platform compelling, though.

HarmonyOS does have this embryonic concept of app mobility, where like you could put an app from your phone on your fridge, or something. Clearly I am not doing it justice here, but let's assume it's a compelling use case. In that situation it would be nice for all devices to present similar abstractions, so you could somehow install the same app on two different kinds of devices, and they could communicate to transfer data. As you can see here though, I am straying far from my domain of expertise.

One reasonable way to "move" an app is to have it stay running on your phone and the phone just communicates pixels with your fridge (or whatever); this is the low-level solution. I think HarmonyOS appears to be going for the higher-level solution where the app actually runs logic on the device. In that case it would make sense to ship UI assets and JavaScript / extended TypeScript bytecode to the device, which would run the app with an interpreter (for low-powered devices) or use JIT/AOT compilation. The Ark runtime itself would live on all devices, specialized to their capabilities.

In a way this is the Apple WatchOS solution (as I understand it); developers publish their apps as LLVM bitcode, and Apple compiles it for the specific devices. A FlutterOS with a Flutter run-time on all devices could do something similar. As with WatchOS you wouldn't have to ship the framework itself in the app bundle; it would be on the device already.

Finally, publishing apps as some kind of intermediate representation also has security benefits: as the OS developer, you can ensure some invariants via the toolchain that you control. Of course, you would have to ensure that the Flutter API is sufficiently expressive for high-performance applications, while also not having arbitrary machine code execution vulnerabilities; there is a question of language and framework design as well as toolchain and runtime quality of implementation. HarmonyOS could be headed in this direction.

Conclusion

Ark is a fascinating effort that holds much promise. It's also still in motion; where will it be when it anneals to its own local optimum? It would appear that the system is approaching usability, but I expect a degree of churn in the near-term as Ark designers decide which language abstractions work for them and how to, well, harmonize them with the rest of JavaScript.

For me, the biggest open question is whether developers will love Ark in the way they love, say, React. In a market where Huawei is still a dominant vendor, I think the material conditions are there for a good developer experience: people tend to like Flutter and React, and Ark is similar. Huawei "just" needs to explain their framework well (and where it's hard to explain, to go back and change it so that it is explainable).

But in a more heterogeneous market, to succeed Ark would need to make a cross-platform runtime like the one Flutter has and engage in some serious marketing efforts, so that developers don't have to limit themselves to targetting the currently-marginal HarmonyOS. Selling extensions to JavaScript will be much more difficult in a context where the competition is already established, but perhaps Ark will be able to productively engage with TypeScript maintainers to move the language so it captures some of the benefits of Dart that facilitate ahead-of-time compilation.

Well, that's it for my review round-up; hope you have enjoyed the series. I have one more pending article, speculating about some future technologies. Until then, happy hacking, and see you next time.

structure and interpretation of flutter

26 April 2023 1:50 PM (igalia | compilers | mobile | ionic | capacitor | react | react native | nativescript | flutter | dart | javascript)

Good day, gentle hackfolk. Like an old-time fiddler I would appear to be deep in the groove, playing endless variations on a theme, in this case mobile application frameworks. But one can only recognize novelty in relation to the familiar, and today's note is a departure: we are going to look at Flutter, a UI toolkit based not on JavaScript but on the Dart language.

The present, from the past

Where to start, even? The problem is big enough that I'll approach it from three different angles: from the past, from the top, and from the bottom.

With the other frameworks we looked at, we didn't have to say much about their use of JavaScript. JavaScript is an obvious choice, in 2023 at least: it is ubiquitous, has high quality implementations, and as a language it is quite OK and progressively getting better. Up to now, "always bet on JS" has had an uninterrupted winning streak.

But winning is not the same as unanimity, and Flutter and Dart represent an interesting pole of contestation. To understand how we got here, we have to go back in time. Ten years ago, JavaScript just wasn't a great language: there were no modules, no async functions, no destructuring, no classes, no extensible iteration, no optional arguments to functions. In addition it was hobbled with a significant degree of what can only be called accidental sloppiness: with which can dynamically alter a lexical scope, direct eval that can define new local variables, Function.caller, and so on. Finally, larger teams were starting to feel the need for more rigorous language tooling that could use types to prohibit some classes of invalid programs.

All of these problems in JavaScript have been addressed over the last decade, mostly successfully. But in 2010 or so if you were a virtual machine engineer, you might look at JavaScript and think that in some other world, things could be a lot better. That's effectively what happened: the team that originally built V8 broke off and started to work on what became Dart.

Initially, Dart was targetted for inclusion in the Chrome web browser as an alternate "native" browser language. This didn't work, for various reasons, but since then Dart grew the Flutter UI toolkit, which has breathed new life into the language. And this is a review of Flutter, not a review of Dart, not really anyway; to my eyes, Dart is spiritually another JavaScript, different but in the same family. Dart's implementation has many interesting aspects as well that we'll get into later on, but all of these differences are incidental: they could just as well be implemented on top of JavaScript, TypeScript, or another source language in that family. Even if Flutter isn't strictly part of the JavaScript-based mobile application development frameworks that we are comparing, it is valuable to the extent that it shows what is possible, and in that regard there is much to say.

Flutter, from the top

At its most basic, Flutter is a UI toolkit for Dart. In many ways it is like React. Like React, its interface follows the functional-reactive paradigm: programmers describe the "what", and Flutter takes care of the "how". Also, like the phenomenon in which new developers can learn React without really knowing JavaScript, Flutter is the killer app for Dart: Flutter developers mostly learn Dart at the same time that they pick up Flutter.

In some other ways, Flutter is the logical progression of React, going in the same direction but farther along. Whereas React-on-the-web takes the user's declarative specifications of what the UI should look like and lowers them into DOM trees, and React Native lowers them to platform-native UI widgets, Flutter has its own built-in layout, rasterization, and compositing engine: Flutter draws all the pixels.

This has the predictable challenge that Flutter has to make significant investments so that its applications don't feel out-of-place on their platform, but on the other hand it opens up a huge space for experimentation and potential optimization: Flutter has the potential to beat native at its own game. Recall that with React Native, the result of the render-commit-mount process is a tree of native widgets. The native platform will surely then perform a kind of layout on those widgets, divide them into layers that correspond to GPU textures, paint those layers, then composite them to the screen -- basically, what a web engine will do.

What if we could instead skip the native tree and go directly to the lower GPU layer? That is the promise of Flutter. Flutter has the potential to create much more smooth and complex animations than the other application development frameworks we have mentioned, with lower overhead and energy consumption.

In practice... that's always the question, isn't it? Again, please accept my usual caveat that I am a compilers guy moonlighting in the user interface domain, but my understanding is that Flutter mostly lives up to its promise, but with one significant qualification which we'll get to in a minute. But before that, let's traverse Flutter from the other direction, coming up from Dart.

Dart, from the bottom

To explain some aspects of Dart I'm going to tell a just-so story that may or may not be true. I know and like many of the Dart developers, and we have similar instincts, so it's probably not too far from the truth.

Let's say you are the team that originally developed V8, and you decide to create a new language. You write a new virtual machine that looks like V8, taking Dart source code as input and applying advanced adaptive compilation techniques to get good performance. You can even be faster than JS because your language is just a bit more rigid than JavaScript is: you have traded off expressivity for performance. (Recall from our discussion of NativeScript that expressivity isn't a value judgment: there can be reasons to pay for more "mathematically appealing operational equivalences", in Felleisen's language, in exchange for applying more constraints on a language.)

But, you fail to ship the VM in a browser; what do you do? The project could die; that would be annoying, but you work for Google, so it happens all the time. However, a few interesting things happen around the same time that will cause you to pivot. One is a concurrent experiment by Chrome developers to pare the web platform down to its foundations and rebuild it. This effort will eventually become Flutter; while it was originally based on JS, eventually they will choose to switch to Dart.

The second thing that happens is that recently-invented smart phones become ubiquitous. Most people have one, and the two platforms are iOS and Android. Flutter wants to target them. You are looking for your niche, and you see that mobile application development might be it. As the Flutter people continue to experiment, you start to think about what it would mean to target mobile devices with Dart.

The initial Dart VM was made to JIT, but as we know, Apple doesn't let people do this on iOS. So instead you look to write a quick-and-dirty ahead-of-time compiler, based on your JIT compiler that takes your program as input, parses and baseline-compiles it, and generates an image that can be loaded at runtime. It ships on iOS. Funnily enough, it ships on Android too, because AOT compilation allows you to avoid some startup costs; forced to choose between peak performance via JIT and fast startup via AOT, you choose fast startup.

It's a success, you hit your product objectives, and you start to look further to a proper ahead-of-time compiler native code that can stand alone without the full Dart run-time. After all, if you have to compile at build-time, you might as well take the time to do some proper optimizations. You actually change the language to have a sound typing system so that the compiler can make program transformations that are valid as long as it can rely on the program's types.

Fun fact: I am told that the shift to a sound type system actually started before Flutter and thus before AOT, because of a Dart-to-JavaScript compiler that you inherited from the time in which you thought the web would be the main target. The Dart-to-JS compiler used to be a whole-program compiler; this enabled it to do flow-sensitive type inference, resulting in faster and smaller emitted JavaScript. But whole-program compilation doesn't scale well in terms of compilation time, so Dart-to-JS switched to separate per-module compilation. But then you lose lots of types! The way to recover the fast-and-small-emitted-JS property was through a stronger, sound type system for Dart.

At this point, you still have your virtual machine, plus your ahead-of-time compiler, plus your Dart-to-JS compiler. Such riches, such bounty! It is not a bad situation to be in, in 2023: you can offer a good development experience via the just-in-time compiled virtual machine. Apparently you can even use the JIT on iOS in developer mode, because attaching ptrace to a binary allows for native code generation. Then when you go to deploy, you make a native binary that includes everything.

For the web, you also have your nice story, even nicer than with JavaScript in some ways: because the type checker and ahead-of-time compiler are integrated in Dart, you don't have to worry about WebPack or Vite or minifiers or uglifiers or TypeScript or JSX or Babel or any of the other things that JavaScript people are used to. Granted, the tradeoff is that innovation is mostly centralized with the Dart maintainers, but currently Google seems to be investing enough so that's OK.

Stepping back, this story is not unique to Dart; many of its scenes also played out in the world of JavaScript over the last 5 or 10 years as well. Hermes (and QuickJS, for that matter) does ahead-of-time compilation, albeit only to bytecode, and V8's snapshot facility is a form of native AOT compilation. But the tooling in the JavaScript world is more diffuse than with Dart. With the perspective of developing a new JavaScript-based mobile operating system in mind, the advantages that Dart (and thus Flutter) has won over the years are also on the table for JavaScript to win. Perhaps even TypeScript could eventually migrate to have a sound type system, over time; it would take a significant investment but the JS ecosystem does evolve, if slowly.

(Full disclosure: while the other articles in this series were written without input from the authors of the frameworks under review, through what I can only think was good URL guesswork, a draft copy of this article leaked to Flutter developers. Dart hacker Slava Egorov kindly sent me a mail correcting a number of misconceptions I had about Dart's history. Fair play on whoever guessed the URL, and many thanks to Slava for the corrections; any remaining errors are wholly mine, of course!)

Evaluation

So how do we expect Flutter applications to perform? If we were writing a new mobile OS based on JavaScript, what would it mean in terms of performance to adopt a Flutter-like architecture?

Startup latency

Flutter applications are well-positioned to start fast, with ahead-of-time compilation. However they have had problems realizing this potential, with many users seeing a big stutter when they launch a Flutter app.

To explain this situation, consider the structure of a typical low-end Android mobile device: you have a small number of not-terribly-powerful CPU cores, but attached to the same memory you also have a decent GPU with many cores. For example, the SoC in the low-end Moto E7 Plus has 8 CPU cores and 128 GPU cores (texture shader units). You could paint widget pixels into memory from either the CPU or the GPU, but you'd rather do it in the GPU because it has so many more cores: in the time it takes to compute the color of a single pixel on the CPU, on the GPU you could do, like, 128 times as many, given that the comparison is often between multi-threaded rasterization on the GPU versus single-threaded rasterization on the CPU.

Flutter has always tried to paint on the GPU. Historically it has done so via a GPU back-end to the Skia graphics library, notably used by Chrome among other projects. But, Skia's API is a drawing API, not a GPU API; Skia is the one responsible for configuring the GPU to draw what we want. And here's the problem: configuring the GPU takes time. Skia generates shader code at run-time for rasterizing the specific widgets used by the Flutter programmer. That shader code then needs to be compiled to the language the GPU driver wants, which looks more like Vulkan or Metal. The process of compilation and linking takes time, potentially seconds, even.

The solution to "too much startup shader compilation" is much like the solution to "too much startup JavaScript compilation": move this phase to build time. The new Impeller rendering library does just that. However to do that, it had to change the way that Flutter renders: instead of having Skia generate specialized shaders at run-time, Impeller instead lowers the shapes that it draws to a fixed set of primitives, and then renders those primitives using a smaller, fixed set of shaders. These primitive shaders are pre-compiled at build time and included in the binary. By switching to this new renderer, Flutter should be able to avoid startup jank.

Jank

Of all the application development frameworks we have considered, to my mind Flutter is the best positioned to avoid jank. It has the React-like asynchronous functional layout model, but "closer to the metal"; by skipping the tree of native UI widgets, it can potentially spend less time for each frame render.

When you start up a Flutter app on iOS, the shell of the application is actually written in Objective C++. On Android it's the same, except that it's Java. That shell then creates a FlutterView widget and spawns a new thread to actually run Flutter (and the user's Dart code). Mostly, Flutter runs on its own, rendering frames to the GPU resources backing the FlutterView directly.

If a Flutter app needs to communicate with the platform, it passes messages across an asynchronous channel back to the main thread. Although these messages are asynchronous, this is probably the largest potential source of jank in a Flutter app, outside the initial frame paint: any graphical update which depends on the answer to an asynchronous call may lag.

Peak performance

Dart's type system and ahead-of-time compiler optimize for predictable good performance rather than the more variable but potentially higher peak performance that could be provided by just-in-time compilation.

This story should probably serve as a lesson to any future platform. The people that developed the original Dart virtual machine had a built-in bias towards just-in-time compilation, because it allows the VM to generate code that is specialized not just to the program but also to the problem at hand. A given system with ahead-of-time compilation can always be made to perform better via the addition of a just-in-time compiler, so the initial focus was on JIT compilation. On iOS of course this was not possible, but on Android and other platforms where this was available it was the default deployment model.

However, even Android switched to ahead-of-time compilation instead of the JIT model in order to reduce startup latency: doing any machine code generation at all at program startup was more work than was needed to get to the first frame. One could add JIT back again on top of AOT but it does not appear to be a high priority.

I would expect that Capacitor could beat Dart in some raw throughput benchmarks, given that Capacitor's JavaScript implementation can take advantage of the platform's native JIT capability. Does it matter, though, as long as you are hitting your frame budget? I do not know.

Aside: An escape hatch to the platform

What happens if you want to embed a web view into a Flutter app?

If you think on the problem for a moment I suspect you will arrive at the unsatisfactory answer, which is that for better or for worse, at this point it is too expensive even for Google to make a new web engine. Therefore Flutter will have to embed the native WebView. However Flutter runs on its own threads; the native WebView has its own process and threads but its interface to the app is tied to the main UI thread.

Therefore either you need to make the native WebView (or indeed any other native widget) render itself to (a region of) Flutter's GPU backing buffer, or you need to copy the native widget's pixels into their own texture and then composite them in Flutter-land. It's not so nice! The Android and iOS platform view documentation discuss some of the tradeoffs and mitigations.

Aside: For want of a canvas

There is a very funny situation in the React Native world in which, if the application programmer wants to draw to a canvas, they have to embed a whole WebView into the React Native app and then proxy the canvas calls into the WebView. Flutter is happily able to avoid this problem, because it includes its own drawing library with a canvas-like API. Of course, Flutter also has the luxury of defining its own set of standard libraries instead of necessarily inheriting them from the web, so when and if they want to provide equivalent but differently-shaped interfaces, they can do so.

Flutter manages to be more expressive than React Native in this case, without losing much in the way of understandability. Few people will have to reach to the canvas layer, but it is nice to know it is there.

Conclusion

Dart and Flutter are terribly attractive from an engineering perspective. They offer a delightful API and a high-performance, flexible runtime with a built-in toolchain. Could this experience be brought to a new mobile operating system as its primary programming interface, based on JavaScript? React Native is giving it a try, but I think there may be room to take things further to own the application from the program all the way down to the pixels.

Well, that's all from me on Flutter and Dart for the time being. Next up, a mystery guest; see you then!

structure and interpretation of nativescript

24 April 2023 9:10 AM (igalia | compilers | mobile | ionic | capacitor | react | react native | nativescript | flutter | dart | javascript)

Greetings, hackers tall and hackers small!

We're only a few articles in to this series on mobile application development frameworks, but I feel like we are already well into our journey. We started our trip through the design space with a look at Ionic / Capacitor, which defines its user interface in terms of the web platform, and only calls out to iOS or Android native features as needed. We proceeded on to React Native, which moves closer to native by rendering to platform-provided UI widgets, layering a cross-platform development interface on top.

Today's article takes an in-depth look at NativeScript, whose point in the design space is further on the road towards the platform, unabashedly embracing the specificities of the API available on iOS and Android, exposing these interfaces directly to the application programmer.

In practice what this looks like is that a NativeScript app is a native app which simply happens to call JavaScript on the main UI thread. That JavaScript has access to all native APIs, directly, without the mediation of serialization or message-passing over a bridge or message queue.

The first time I heard this I thought that it couldn't actually be all native APIs. After all, new versions of iOS and Android come out quite frequently, and surely it would take some effort on the part of NativeScript developers to expose the new APIs to JavaScript. But no, it really includes all of the various native APIs: the NativeScript developers wrote a build-time inspector that uses the platform's native reflection capabilities to grovel through all available APIs and to automatically generate JavaScript bindings, with associated TypeScript type definitions so that the developer knows what is available.

Some of these generated files are checked into source, so you can get an idea of the range of interfaces that are accessible to programmers; for example, see the iOS type definitions for x86-64. There are bindings for, like, everything.

Given access to all the native APIs, how do you go about making an app? You could write the same kind of programs that you would in Swift or Kotlin, but in JavaScript. But this would require more than just the ability to access native capabilities when needed: it needs a thorough knowledge of the platform interfaces, plus NativeScript itself on top. Most people don't have this knowledge, and those that do are probably programming directly in Swift or Kotlin already.

On one level, NativeScript's approach is to take refuge in that most ecumenical of adjectives, "unopinionated". Whereas Ionic / Capacitor encourages use of web platform interfaces, and React Native only really supports React as a programming paradigm, NativeScript provides a low-level platform onto which you can layer a number of different high-level frameworks.

Now, most high-level JavaScript application development frameworks are oriented to targetting the web: they take descriptions of user interfaces and translate them to the DOM. When targetting NativeScript, you could make it so that they target native UI widgets instead. However given the baked-in assumptions of how widgets should be laid out (notably via CSS), there is some impedance-matching to do between DOM-like APIs and native toolkits.

NativeScript's answer to this problem is a middle layer: a cross-platform UI library that provides DOM-like abstractions and CSS layout in a way that bridges the gap between web-like and native. You can even define parts of the UI using a NativeScript-specific XML vocabulary, which NativeScript compiles to native UI widget calls at run-time. Of course, there is no CSS engine in UIKit or Android's UI toolkit, so NativeScript includes its own, implemented in JavaScript of course.

You could program directly to this middle layer, but I suspect that its real purpose is in enabling Angular, Vue, Svelte, or the like. The pitch would be that NativeScript lets app developers use pleasant high-level abstractions, but while remaining close to the native APIs; you can always drop down for more power and expressiveness if needed.

Diving back down to the low level, as we mentioned all of the interactions between JavaScript and the native platform APIs happen on the main application UI thread. NativeScript does also allow programmers to create background threads, using an implementation of the Web Worker API. One could even in theory run a React-based UI in a worker thread and proxy native UI updates to the main thread; as an unopinionated platform, NativeScript can support many different frameworks and paradigms.

Finally, there is the question of how NativeScript runs the JavaScript in an application. Recall that Ionic / Capacitor uses the native JS engine, by virtue of using the native WebView, and that React Native used to use JavaScriptCore on both platforms but now uses its own Hermes implementation. NativeScript is another point in the design space, using V8 on both platforms. (They used to use JavaScriptCore on iOS but switched to V8 once V8 was able to run on iOS in "jitless" mode.) Besides the reduced maintenance burden of using a single implementation on all platforms, this also has the advantage of being able to use V8 snapshots to move JavaScript parse-and-compile work to build-time, even on iOS.

Evaluation

NativeScript is fundamentally simple: it's V8 running in an application's main UI thread, with access to all platform native APIs. So how do we expect it to perform?

Startup latency

In theory, applications with a NativeScript-like architecture should have no problem with startup time, because they can pre-compile all of their JavaScript into V8 snapshots. Snapshots are cheap to load up because they are already in a format that V8 is ready to consume.

In practice, it would seem that V8 snapshots do not perform as expected for NativeScript. There are a number of aspects about this situation that I don't understand, which I suspect relate to the state of the tooling around V8 rather than to the fundamental approach of ahead-of-time compilation. V8 is really made for Chrome, and it could be that not enough maintenance resources have been devoted to this snapshot facility.

In the meantime, NativeScript instead uses V8's code cache feature, which caches the result of parsing and compiling JavaScript files on the device. In this way the first time an app is installed or updated, it might start up slowly, but subsequent runs are faster. If you were designing a new operating system, you'd probably want to move this work to app install-time.

As we mentioned above, NativeScript apps have access to all native APIs. That is a lot of APIs, and only some of those interfaces will actually be used by any given app. In an ideal world, we would expect the build process to only include JavaScript code for those APIs that are needed by the application. However in the presence of eval and dynamic property lookup, pruning the native API surface to the precise minimum is a hard problem for a bundler to perform on its own. The solution for the time being is to manually allow and deny subsets of the platform native API. It's not an automatic process though, so it can be error-prone.

Besides the work that the JavaScript engine has to do to load an application's code, the other startup overhead involves whatever work that JavaScript might need to perform before the first frame is shown. In the case of NativeScript, more work is done before the initial layout than one would think: the main UI XML file is parsed by an XML parser written in JavaScript, any needed CSS files are parsed and loaded (again by JavaScript), and the tree of XML elements is translated to a tree of UI elements. The layout of the items in the view tree is then computed (in JavaScript, but calling into native code to measure text and so on), and then the app is ready.

At this point, I am again going to wave my "I am just a compiler engineer" flag: I am not a UI specialist, much less a NativeScript specialist. As in compilers, performance measurement and monitoring are key to UI development, but I suspect that also as in compilers there is a role for gut instinct. Incremental improvements are best driven by metrics, but qualitative leaps are often the result of somewhat ineffable hunches or even guesswork. In that spirit I can only surmise that React Native has an advantage over NativeScript in time-to-first-frame, because its layout is performed in C++ and because its element tree is computed directly from JavaScript instead of having JavaScript interpret XML and CSS files. In any case, I look forward to the forthcoming part 2 of the NativeScript and React Native performance investigations that were started in November 2022.

If I were NativeScript and using NativeScript's UI framework, and if startup latency proves to actually be a problem, I would lean into something in the shape of Angular's ahead-of-time compilation mode, but for the middle NativeScript UI layer.

Jank

On the face of it, NativeScript is the most jank-prone of the three frameworks we have examined, because it runs JavaScript on the main application UI thread, interleaved with UI event handling and painting and all of that. If an app's JavaScript takes too long to run, the app might miss frames or fail to promptly handle an event.

On the other hand, relative to React Native, the user's code is much closer to the application's behavior. There's no asynchrony between the application's logic and its main loop: in NativeScript it is easy to identify the code causing jank and eventually fix it.

The other classic JavaScript-on-the-main-thread worry relates to garbage collection pauses. V8's garbage collector does try to minimize the stop-the-world phase by tracing the heap concurrently and leveraging parallelism during pauses. Also, the user interface of a mobile app runs in an event loop, and typically spends most of its time idle; V8 exposes some API that can take advantage of this idle time to perform housekeeping tasks instead of needing to do them when handling high-priority events.

That said, having looked into the code of both the iOS and Android run-times, NativeScript does not currently take advantage of this facility. I dug deeper and it would seem that V8 itself is in flux, as the IdleNotificationDeadline API is on its way out; is the thought that concurrent tracing is largely sufficient? I would expect that if conservative stack scanning lands, we will see a re-introduction of this kind of API, as it does make sense to synchronize with the event loop when scanning the main thread stack.

Peak performance

As we have seen in our previous evaluations, this question boils down to "is the JavaScript engine state-of-the-art, and can it perform just-in-time compilation". In the case of NativeScript, the answers are yes and maybe, respectively: V8 is state-of-the-art, and it can JIT on Android, but not on iOS.

Perhaps the mitigation here is that the hardware that iOS runs on tends to be significantly more powerful than median Android devices; if you had to pick a subset of users to penalize with an interpreter-only run-time, people with iPhones are the obvious choice, because they can afford it.

Aside: Are markets wise?

Recall that our perspective in this series is that of the designer of a new JavaScript-based mobile development platform. We are trying to answer the question of what would it look like if a new platform offered a NativeScript-like experience. In this regard, only the structure of NativeScript is of interest, and notably its "market success" is not relevant, except perhaps in some Hayekian conception of the world in which markets are necessarily smarter than, well, me, or you, or any one of us.

It must be said, though, that React Native is the 800-pound gorilla of JavaScript mobile application development. The 2022 State of JS survey shows that among survey respondents, more people are aware of React Native than any other mobile framework, and people are generally more positive about React Native than other frameworks. Does NativeScript's mitigated market share indicate something about its architecture, or does it speak speak more to the size of Facebook's budget, both on the developer experience side and on marketing?

Aside: On the expressive power of application framworks

Oddly, I think the answer to the market wisdom question might be found in a 35-year-old computer science paper, "On the expressive power of programming languages" (PDF).

In this paper, Matthias Felleisen considers the notion of what it means for one programming language to be more expressive than another. For example, is a language with just for less expressive than a language with both for and while? Intuitively we would say no, these are similar things; you can make a simple local transformation of while (x) {...} to for (;x;) {...} and you have exactly the same program semantics. On the other hand a language with just for is less expressive than one which also has goto; there is no simple local rewrite that can turn goto into for.

In the same way, we can consider the question of what it would mean for one library to be more expressive than another. After all, the API of a library exposes a language in which its user can write programs; we should be able to reason about these languages. So between React Native and NativeScript, which one is more expressive?

By Felleisen's definitions, NativeScript is clearly the more expressive language: there is no simple local transformation that can turn imperative operations on native UI widgets into equivalent functional-reactive programs. Yes, with enough glue code React Native can reach directly to native APIs in a similar way as NativeScript, but everything that touches the native UI tree is expressly under React Native's control: there is no sanctioned escape hatch.

You might think that "more expressive" is always better, but Felleisen's take is more nuanced than that. Yes, he says, more expressive languages do allow programmers to make more concise programs, because they allow programmers to define abstractions that encapsulate patterns, and this is a good thing. However he also concludes that "an increase in expressive power is related to a decrease of the set of 'natural' (mathematically appealing) operational equivalences." Less expressive programming languages are easier to reason about, in general, and indeed that is one of the recognized strengths of React's programming model: it is easy to compose components and have confidence that the result will work.

Summary

A NativeScript-like architecture offers the possibility of performance: the developer has all the capabilities needed for writing pleasant-to-use applications that blend in with the platform-native experience. It is up to the developers to choose how to use the power at their disposal. In the wild, I expect that the low-level layer of NativeScript's API is used mainly by expert developers, who know how to assemble well-functioning machines from the parts on offer.

As a primary programming interface for a new JavaScript-based mobile platform, though, just providing a low-level API would seem to be not enough. NativeScript rightly promotes the use of more well-known high-level frameworks on top: Angular, Vue, Svelte, and such. Less experienced developers should use an opinionated high-level UI framework; these developers don't have good opinions yet and the API should lead them in the right direction.

That's it for today. Thanks for reading these articles, by the way; I have enjoyed diving into this space. Next up, we'll take a look beyond JavaScript, to Flutter and Dart. Until then, happy hacking!

structure and interpretation of react native

21 April 2023 8:20 AM (igalia | compilers | mobile | ionic | capacitor | react | react native | nativescript | flutter | dart | javascript)

Hey hey! Today's missive continues exploring the space of JavaScript and mobile application development.

Yesterday we looked into Ionic / Capacitor, giving a brief structural overview of what Capacitor apps look like under the hood and how this translates to three aspects of performance: startup latency, jank, and peak performance. Today we'll apply that same approach to another popular development framework, React Native.

Background: React

I don't know about you, but I find that there is so much marketing smoke and lights around the whole phenomenon that is React and React Native that sometimes it's hard to see what's actually there. This is compounded by the fact that the programming paradigm espoused by React (and its "native" cousin that we are looking at here) is so effective at enabling JavaScript UI programmers to focus on the "what" and not the "how" that the machinery supporting React recedes into the background.

At its most basic, React is what they call a functional reactive programming model. It is functional in the sense that the user interface elements render as a function of the global application state. The reactive comes into how user input is handled, but I'm not going to focus on that here.

React's rendering process starts with a root element tree, describing the root node of the user interface. An element is a JavaScript object with a type property. To render an element tree, if the value of the type property is a string, then the element is terminal and doesn't need further lowering, though React will visit any node in the children property of the element to render them as needed.

Otherwise if the type property of an element is a function, then the element node is functional. In that case React invokes the node's render function (the type property), passing the JavaScript element object as the argument. React will then recursively re-render the element tree produced as a result of rendering the component until all nodes are terminal. (Functional element nodes can instead have a class as their type property, but the concerns are pretty much the same.)

(In the language of React Native, a terminal node is a React Host Component, and a functional node is a React Composite Component, and both are React Elements. There are many imprecisely-used terms in React and I will continue this tradition by using the terms I mention above.)

The rendering phase of a React application is thus a function from an element tree to a terminal element tree. Nodes of element trees can be either functional or terminal. Terminal element trees are composed only of terminal elements. Rendering lowers all functional nodes to terminal nodes. This description applies both to React (targetting the web) and React Native (which we are reviewing here).

It's probably useful to go deeper into what React does with a terminal element tree, before building to the more complex pipeline used in React Native, so here we go. The basic idea is that React-on-the-web does impedance matching between the functional description of what the UI should have, as described by a terminal element tree, and the stateful tree of DOM nodes that a web browser uses to actually paint and display the UI. When rendering yields a new terminal element tree, React will compute the difference between the new and old trees. From that difference React then computes the set of imperative actions needed to mutate the DOM tree to correspond to what the new terminal element tree describes, and finally applies those changes.

In this way, small changes to the leaves of a React element tree should correspond to small changes in the DOM. Additionally, since rendering is a pure function of the global application state, we can avoid rendering at all when the application state hasn't changed. We'll dive into performance more deeply later on in the article.

React Native doesn't use a WebView

React Native is similar to React-on-the-web in intent but different in structure. Instead of using a WebView on native platforms, as Ionic / Capacitor does, React Native renders the terminal element tree to platform-native UI widgets.

When a React Native functional element renders to a terminal element, it will create not just a JS object for the terminal node as React-on-the-web does, but also a corresponding C++ shadow object. The fully lowered tree of terminal elements will thus have a corresponding tree of C++ shadow objects. React Native will then calculate the layout for each node in the shadow tree, and then commit the shadow tree: as on the web, React Native computes the set of imperative actions needed to change the current UI so that it corresponds to what the shadow tree describes. These changes are then applied on the main thread of the application.

The twisty path that leads one to implement JavaScript

The description above of React Native's rendering pipeline applies to the so-called "new architecture", which has been in the works for some years and is only now (April 2023) starting to be deployed. The key development that has allowed React Native to move over to this architecture is tighter integration and control over its JavaScript implementation. Instead of using the platform's JavaScript engine (JavaScriptCore on iOS or V8 on Android), Facebook went and made their own whole new JavaScript implementation, Hermes. Let's step back a bit to see if we can imagine why anyone in their right mind would make a new JS implementation.

In the last article, I mentioned that the only way to get peak JS performance on iOS is to use the platform's WkWebView, which enables JIT compilation of JavaScript code. React Native doesn't want a WebView, though. I guess you could create an invisible WebView and just run your JavaScript in it, but the real issue is that the interface to the JavaScript engine is so narrow as to be insufficiently expressive. You can't cheaply synchronously create a shadow tree of layout objects, for example, because every interaction with JavaScript has to cross a process boundary.

So, it may be that JIT is just not worth paying for, if it means having to keep JavaScript at arm's distance from other parts of the application. How do you do JavaScript without a browser on mobile, though? Either you use the platform's JavaScript engine, or you ship your own. It would be nice to use the same engine on iOS and Android, though. When React Native was first made, V8 wasn't able to operate in a mode that didn't JIT, so React Native went with JavaScriptCore on both platforms.

Bundling your own JavaScript engine has the nice effect that you can easily augment it with native extensions, for example to talk to the Swift or Java app that actually runs the main UI. That's what I describe above with the creation of the shadow tree, but that's not quite what the original React Native did; I can only speculate but I suspect that there was a fear that JavaScript rendering work (or garbage collection!) could be heavy enough to cause the main UI to drop frames. Phones were less powerful in 2016, and JavaScript engines were less good. So the original React Native instead ran JavaScript in a separate thread. When a render would complete, the resulting terminal element tree would be serialized as JSON and shipped over to the "native" side of the application, which would actually apply the changes.

This arrangement did work, but it ran into problems whenever the system needed synchronous communication between native and JavaScript subsystems. As I understand it, this was notably the case when React layout would need the dimensions of a native UI widget; to avoid a stall, React would assume something about the dimensions of the native UI, and then asynchronously re-layout once the actual dimensions were known. This was particularly gnarly with regards to text measurements, which depend on low-level platform-specific rendering details.

To recap: React Native had to interpret its JS on iOS and was using a "foreign" JS engine on Android, so they weren't gaining anything by using a platform JS interpreter. They would sometimes have some annoying layout jank when measuring native components. And what's more, React Native apps would still experience the same problem as Ionic / Capacitor apps, in that application startup time was dominated by parsing and compiling the JavaScript source files.

The solution to this problem was partly to switch to the so-called "new architecture", which doesn't serialize and parse so much data in the course of rendering. But the other side of it was to find a way to move parsing and compiling JavaScript to the build phase, instead of having to parse and compile JS every time the app was run. On V8, you would do this by generating a snapshot. On JavaScriptCore, which React Native used, there was no such facility. Faced with this problem and armed with Facebook's bank account, the React Native developers decided that the best solution would be to make a new JavaScript implementation optimized for ahead-of-time compilation.

The result is Hermes. If you are familiar with JavaScript engines, it is what you might expect: a JavaScript parser, originally built to match the behavior of Esprima; an SSA-based intermediate representation; a set of basic optimizations; a custom bytecode format; an interpreter to run that bytecode; a GC to manage JS objects; and so on. Of course, given the presence of eval, Hermes needs to include the parser and compiler as part of the virtual machine, but the hope is that most user code will be parsed and compiled ahead-of-time.

If this were it, I would say that Hermes seems to me to be a dead end. V8 is complete; Hermes is not. For example, Hermes doesn't have with, async function implementation has been lagging, and so on. Why Hermes when you can V8 (with snapshots), now that V8 doesn't require JIT code generation?

I thought about this for a while and in the end, given that V8's main target isn't as an embedded library in a mobile app, perhaps the binary size question is the one differentiating factor (in theory) for Hermes. By focussing on lowering distribution size, perhaps Hermes will be a compelling JS engine in its own right. In any case, Facebook can afford to keep Hermes running for a while, regardless of whether it has a competitive advantage or not.

It sounds like I'm criticising Hermes here but that's not really the point. If you can afford it, it's good to have code you control. For example one benefit that I see React Native getting from Hermes is that they control the threading model; they can mostly execute JS in its own thread, but interrupt that thread and switch to synchronous main-thread execution in response to high-priority events coming from the user. You might be able to do that with V8 at some point but the mobile-apps-with-JS domain is still in flux, so it's nice to have a sandbox that React Native developers can use to explore the system design space.

Evaluation

With that long overview out of the way, let's take a look to what kinds of performance we can expect out of a React Native system.

Startup latency

Because React Native apps have their JavaScript code pre-compiled to Hermes bytecode, we can expect that the latency imposed by JavaScript during application startup is lower than is the case with Ionic / Capacitor, which needs to parse and compile the JavaScript at run-time.

However, it must be said that as a framework, React tends to result in large application sizes and incurs significant work at startup time. One of React's strengths is that it allows development teams inside an organization to compose well: because rendering is a pure function, it's easy to break down the task of making an app into subtasks to be handled by separate groups of people. Could this strength lead to a kind of weakness, in that there is less of a need for overall coordination on the project management level, such that in the end nobody feels responsible for overall application performance? I don't know. I think the concrete differences between React Native and React (the C++ shadow object tree, the multithreading design, precompilation) could mean that React Native is closer to an optimum in the design space than React. It does seem to me though that whether a platform's primary development toolkit shold be React-like remains an open question.

Jank

In theory React Native is well-positioned to avoid jank. JavaScript execution is mostly off the main UI thread. The threading model changes to allow JavaScript rendering to be pre-empted onto the main thread do make me wonder, though: what if that work takes too much time, or what if there is a GC pause during that pre-emption? I would not be surprised to see an article in the next year or two from the Hermes team about efforts to avoid GC during high-priority event processing.

Another question I would have about jank relates to interactivity. Say the user is dragging around a UI element on the screen, and the UI needs to re-layout itself. If rendering is slow, then we might expect to see a lag between UI updates and the dragging motion; the app technically isn't dropping frames, but the render can't complete in the 16 milliseconds needed for a 60 frames-per-second update frequency.

Peak perf

But why might rendering be slow? On the one side, there is the fact that Hermes is not a high-performance JavaScript implementation. It uses a simple bytecode interpreter, and will never be able to meet the performance of V8 with JIT compilation.

However the other side of this is the design of the application framework. In the limit, React suffers from the O(n) problem: any change to the application state requires the whole element tree to be recomputed. Rendering and layout work is proportional to the size of the application, which may have thousands of nodes.

Of course, React tries to minimize this work, by detecting subtrees whose layout does not change, by avoiding re-renders when state doesn't change, by minimizing the set of mutations to the native widget tree. But the native widgets aren't the problem: the programming model is, or it can be anyway.

Aside: As good as native?

Again in theory, React Native can used to write apps that are as good as if they were written directly against platform-native APIs in Kotlin or Swift, because it uses the same platform UI toolkits as native applications. React Native can also do this at the same time as being cross-platform, targetting iOS and Android with the same code. In practice, besides the challenge of designing suitable cross-platform abstractions, React Native has to grapple with potential performance and memory use overheads of JavaScript, but the result has the potential to be quite satisfactory.

Aside: Haven't I seen that rendering model somewhere?

As I mentioned in the last article, I am a compiler engineer, not a UI specialist. In the course of my work I do interact with a number of colleagues working on graphics and user interfaces, notably in the context of browser engines. I was struck when reading about React Native's rendering pipeline about how much it resembled what a browser itself will do as part of the layout, paint, and render pipeline: translate a tree of objects to a tree of immutable layout objects, clip those to the viewport, paint the ones that are dirty, and composite the resulting textures to the screen.

It's funny to think about how many levels we have here: the element tree, the recursively expanded terminal element tree, the shadow object tree, the platform-native widget tree, surely a corresponding platform-native layout tree, and then the GPU backing buffers that are eventually composited together for the user to see. Could we do better? I could certainly imagine any of these mobile application development frameworks switching to their own Metal/Vulkan-based rendering architecture at some point, to flatten out these layers.

Summary

By all accounts, React Native is a real delight to program for; it makes developers happy. The challenge is to make it perform well for users. With its new rendering architecture based on Hermes, React Native may well be on the path to addressing many of these problems. Bytecode pre-compilation should go a long way towards solving startup latency, provided that React's expands-to-fit-all-available-space tendency is kept in check.

If you were designing a new mobile operating system from the ground up, though, I am not sure that you would necessarily end up with React Native as it is. At the very least, you would include Hermes and the base run-time as part of your standard library, so that every app doesn't have to incur the space costs of shipping the run-time. Also, in the same way that Android can ahead-of-time and just-in-time compile its bytecode, I would expect that a mobile operating system based on React Native would extend its compiler with on-device post-install compilation and possibly JIT compilation as well. And at that point, why not switch back to V8?

Well, that's food for thought. Next up, NativeScript. Until then, happy hacking!

structure and interpretation of capacitor programs

20 April 2023 10:20 AM (igalia | compilers | mobile | ionic | capacitor | react | react native | nativescript | flutter | dart | javascript)

Good day, hackers! Today's note is a bit of a departure from compilers internals. A client at work recently asked me to look into cross-platform mobile application development and is happy for the results to be shared publically. This, then, is the first in a series of articles.

Mobile apps and JavaScript: how does it work?

I'll be starting by taking a look at Ionic/Capacitor, React Native, NativeScript, Flutter/Dart, and then a mystery guest. This article will set the stage and then look into Ionic/Capacitor.

The angle I am taking is, if you were designing a new mobile operating system that uses JavaScript as its native application development language, what would it mean to adopt one of these as your primary app development toolkit? It's a broad question but I hope we can come up with some useful conclusions.

I'm going to approach the problem from the perspective of a compiler engineer. Compilers translate high-level languages that people like to write into low-level languages that machines like to run. Compilation is a bridge between developers and users: developers experience the source language interface to the compiler, while users experience the result of compiling those source languages. Note, though, that my expertise is mostly on the language and compiler side of things; though I have worked on a few user-interface libraries in my time, I am certainly not a graphics specialist, nor a user-experience specialist.

I'm explicitly not including the native application development toolkits from Android and iOS, because these are most useful to me as a kind of "control" to the experiment. A cross-platform toolkit that compiles down to use native widgets can only be as good as the official SDK, but it might be worse; and a toolkit that routes around the official widgets by using alternate drawing primitives has the possibility to be better, but at the risk of being inconsistent with other platform apps, along with the more general risk of not being as good as the native UI libraries.

And, of course, there is the irritation that SwiftUI / UIKit / AppKit aren't not open source, and that neither iOS nor Android's native UI library is cross platform. If we are designing a new JavaScript-based mobile operating system—that's the conceit of the original problem statement—then we might as well focus on the existing JS-based toolkits. (Flutter of course is the odd one out, but it's interesting enough to include in the investigation.)

Ionic / Capacitor

Let's begin with the Ionic Framework UI toolkit, based on the Capacitor web run-time.

Overview

Capacitor is like an alternate web browser development kit for phones. It uses the platform's native WebView: WKWebView on iOS or WebView on Android. The JavaScript APIs available within that WebView are extended with Capacitor plugins which can access native APIs; these plugins let JavaScript do the sort of things you wouldn't be able to do directly in a web browser.

(Over time, most of the features provided by Capacitor end up in the web platform in some way. For example, there are web standards to allow JavaScript to detect or lock screen rotation. The Fugu project is particularly avant-garde in pushing these capabilities to the web. But, the wheel of web standards grinds very finely, and for an app to have access to, say, the user's contact list now, it is pragmatic to use an extended-webview solution like Capacitor.)

I call Capacitor a "web browser development kit" because creating a Capacitor project effectively copies the shell of a specialized browser into your source tree, usually with an iOS and Android version. Building the app makes a web browser with your extensions that runs your JS, CSS, image, and other assets. Running the app creates a WebView and invokes your JavaScript; your JS should then create the user interface using the DOM APIs that are standard in a WebView.

As you can see, Capacitor is quite full-featured when it comes to native platform capabilities but is a bit bare-bones when it comes to UI. To provide a richer development environment, the Ionic Framework adds a set of widgets and some application life-cycle abstractions on top of the Capacitor WebView. Ionic is not like a "normal" app framework like Vue or Angular because it takes advantage of Capacitor APIs, and because it tries to mimic the presentation and behavior of the native plaform (iOS or Android, mainly), so that apps built with it don't look out-of-place.

The Ionic Framework itself is built using Web Components, which end up creating styled DOM nodes with JavaScript behavior. These components can compose with other app frameworks such as Vue or Angular, allowing developers to share some of their app code between web and mobile apps---you write the main app in Vue, and just implement the components one way on the web and using something like Ionic Framework on mobile.

To recap, an Ionic app uses Ionic Framework libraries on top of a Capacitor run-time. You could use other libraries on top of Capacitor instead of Ionic. In any case, I'm most interested in Capacitor, so let's continue our focus there.

Evaluation

Performance has a direct effect on the experience that users have when interacting with an application, and here I would like to break down the performance in three ways: one, startup latency; two, jank; and three, peak throughput. As Capacitor is structurally much like a web browser, many of these concerns, pitfalls, and mitigation techniques are similar to those on the web.

Startup latency

Startup latency is how long the app makes you wait after you run it before you can interact with it. The app may hide some of this work behind a splash screen, but if there is too much work to do at startup-time, the risk is the user might get bored or frustrated and switch away. The goal, therefore, it to minimize startup latency. Most people consider time-to-interactive of less than than a second to be sufficient; there's room to improve but for better or for worse, user expectations here are lower than they could be.

In the case of Capacitor, when a user launches an application, the app loads a minimal skeleton program that loads a WebView. On iOS and most Android systems, the rendering and JS parts of the WebView run in a separate process that has access to the app's bundled assets: JavaScript files, images, CSS, and so on. The WebView then executes the JavaScript main function to create the UI.

Capacitor application startup time will be dominated by parsing and compiling the JavaScript source files, as well as any initial work needed to boot the app framework (which could require running a significant amount of JS), and will depend on how fast the user's device is. The most effective performance mitigation here is to reduce the amount of JavaScript loaded, but this can be difficult for complex apps. The main techniques to reduce app size are toolchain-based. Tree-shaking, bundling, and minification reduce the number of JS bytes that the engine needs to parse without changing application logic. Code splitting can defer some work until it is needed, but this might just result in jank later on.

Common Ionic application sizes would seem to be between 500 kB and 5 MB of JavaScript. In 2021, Alex Russell suggested that the soon-to-be standard Android performance baseline should be the Moto E7 Plus which, if its performance relative to the mid-range Moto G4 phone that he measured in 2019 translates to JavaScript engine speed, should be able to parse and compile uncompressed JavaScript at a rate of about 1 MB/s. That's not very much, if you want to get startup latency under a second, and a JS payload size of 5 MB would lead to 5-second startup delays for many users. To my mind, this is the biggest challenge for a Capacitor-based app.

(Calculation detail: The Moto G4 JavaScript parse-and-compile throughput is measured at 170 kB/s, for compressed JS. Assuming a compression ratio of 4 and that the Moto E7 Plus is 50% faster than the Moto G4, that gets us to 1 MB/s for uncompressed JS, for what was projected to be a performance baseline in 2023.)

Jank

Jank is when an application's animation and interaction aren't smooth, because the application somehow missed rendering one or more frames. Technically startup time is a form of jank, but it is useful to treat startup separately.

Generally speaking, the question of whether a Capacitor application will show jank depends on who is doing the rendering: if application JavaScript is driving an animation, then this could cause jank, in the same way as it would on the web. The mitigation is to lean on the web platform so that the WebView is the one in charge of ensuring smooth interaction, for example by using CSS animations or the native scrolling capability.

There may always be an impetus to do some animation in JavaScript, though, for example if the design guidelines require a specific behavior that can't be created using raw CSS.

Peak perf

Peak throughput is how much work an app can do per unit time. For an application written in JavaScript, this is a question of how good can the JavaScript engine make a user's code run fast.

Here we need to make an important aside on the strategies that a JavaScript engine uses to run a user's code. A standard technique that JS engines use to make JavaScript run fast is just-in-time (JIT) compilation, in which the engine emits specialized machine code at run-time and then runs that code. However, on iOS the platform prohibits JIT code generation for most applications. The usual justification for this restriction is that the low-level capability granted to a program that allows it to JIT-compile increases the likelihood of security exploits. The only current exception to the no-JIT policy on iOS is for WebView (including the one in the system web browser), which iOS maintainers consider to be sufficiently sandboxed, so that any security exploit that uses JIT code generation won't be able to access other capabilities possessed by the application process.

So much for the motivation, as I understand it. The upshot is, there is no JIT on iOS outside web browsers or web views. If a cross-platform application development toolkit wants peak JavaScript performance on iOS, it has to run that JS in a WebView. Capacitor does this, so it has fast JS on iOS. Note that these restrictions are not in place on Android; you can have fast JS on Android without using a WebView, by using the system's JavaScript libraries.

Of course, actually getting peak throughput out of JavaScript is an art but the well-known techniques for JavaScript-on-the-web apply here; we won't cover them in this series.

The bridge

All of these performance observations are common to all web browsers, but with Capacitor there is the additional wrinkle that a Capacitor application can access native capabilities, for example access to a user's contacts. When application JavaScript goes to access the contacts API, Capacitor will send a message to the native side of the application over a bridge. The message will be serialized to JSON. Capacitor will include an implementation for the native side of the bridge into your app's source code, written in Swift for iOS or Java for Android. The native end of the bridge parses the JSON message, performs the requested action, and sends a message back with the result. Some messages may need to be proxied to the main application thread, because some native APIs can only be processed there.

As you can imagine, the bridge has an overhead. Apps that have high-bandwidth access to native capabilities will also have JSON encoding overhead, as well general asynchronous coordination overhead. It may even be possible that encoding or decoding a large JSON message causes the WebView to miss a frame, for example when accessing a large file in local storage.

The bridge is a necessary component to the design, though; an iOS WebView can't have direct in-process access to native capabilities. For Android, the WebView APIs do not appear to allow this either, though it is theoretically possible to ship your own WebView that could access native APIs directly. In any case, the Capacitor multi-process solution does allow for some parallelism, and the enforced asynchronous nature of the APIs should lead to less modal application programming.

Aside: Can WebView act like native widgets?

Besides performance, one way that users experience application behavior is by way of expectations: users expect (but don't require) a degree of uniformity between different apps on the same platform, for example the same scrolling behavior or the same kinds of widgets. This aspect isn't the most important one to my investigation, because I'm more concerned with a new operating system that might come with its own design language with its own standard JavaScript-based component library, but it's one to note; an Ionic app is starting at a disadvantage relative to a platform-native app. Sometimes ensuring a platform-native UI is just a question of CSS styling and using the right library (like Ionic Framework), but sometimes it might take significant engineering or possibly even new additions to the web platform.

Aside: Is a WebView as good as native widgets?

As a final observation on user experience, it's worth asking the question of whether you can actually make a user interface using the web platform that is "as good" as native widgets. After all, the API and widget set (DOM) available to a WebView is very different from that available to a native application; is a WebView able to use CPU and GPU resources efficiently to create a smooth interface? Are web standards, CSS, and the DOM API somehow a constraint that prevents efficient user interface construction? Here, somewhat embarrassingly, I can't actually say. I am instead going to just play the "compiler engineer" card, that I'm not a UI specialist; to me it is an open question.

Summary

Capacitor, with or without the Ionic Framework on top, is the always-bet-on-the-web solution to the cross-platform mobile application development problem. It uses stock system WebViews and JavaScript, augmented with native capabilities as needed. Consistency with the UX of the specific platform (iOS or Android) may require significant investment, though.

Performance-wise, my evaluation is that Capacitor apps are well-positioned for peak JS performance due to JIT code generation and low jank due to offloading animations to the web platform, but may have problems with startup latency, as Capacitor needs to parse and compile a possibly significant amount of JavaScript each time it is run.

Next time, a different beast: React Native. Until then, happy hacking!

sticking point

18 April 2023 8:23 PM (meta | hackjam | guile | gc | hegel)

Good evening, gentle readers. A brief note tonight, on a sticky place.

See, I have too many projects right now.

In and of itself this is not so much of a problem as a condition. I know my limits; I keep myself from burning out by shedding load, and there is a kind of priority list of which projects keep adequate service levels.

First come the tiny humans that are in my care who need their butts wiped and bodies translated to and from school or daycare and who -- well you know the old Hegelian trope, that the dialectic crank of history doesn't turn itself, that it takes actions from people to synthesize the thesis and the antithesis, and that even History itself isn't always monotonic; in the same way, bedtime is a reality, there are the material conditions of sleepiness and you're-gonna-be-so-tired-tomorrow but without the World-Historical Actor which whose name is Dada ain't nobody getting into pyjamas, not to mention getting in bed and going to sleep.

Lower in the priority queue there is work, and work is precious: a whole day, a day of just thinking about things and solving problems; yes, I wish I could choose all of the problems that I work on but the simple fact of being able to think and solve is a gift, or rather a reward, time stolen from the arbitrary chaotic preliterate interruptors. I love my kids -- and here I have to choose my conjunction wisely -- and also they love me. Which is nice! They express this through wanting to sit on my lap and demand that I read to them when I am thinking about things. We all have our love languages, I suppose.

And the house? There are 5 of us now, and that's, you know, more than 5 times the work it was before because the 3 wee ones are more chaotic than the adults. There is so much laundry. We are now a cook-the-whole-500g-bag-of-pasta family, and we're far from the teenager stage.

Finally there are the weird side projects, of which there are a few. I have some sporty things I do that I can't avoid because I am socially coerced into doing them; all in all a fine configuration. I have some volunteer work, which has parts I like that take time, which I am fine with, and parts I don't that I neglect. There's the garden, also neglected but nice to muck about in sometimes. Sometimes I see friends? Rarely, so rarely, but sometimes.

But, netfriends, if I find a free couple hours, I hack. Not less than a couple hours, because otherwise I can't get into things. More? Yes please! But usually no, because priorities.

I write this note because I realized that in the last month or so, hain't been no hacktime; there's barely 30 minutes between the end of the evening kitchen-clean and the onset of zzzz. That's fine, really; ultimately I need to carve out work time for this sort of thing, somehow or other.

But I was also realizing that I painted myself into a corner with my GC work. See, I need some new benchmarks, and I can't afford to use Guile itself as my benchmark yet, because I can't afford to fix deep bugs right now. I need something small. Specifically I need a little language implementation with efficient GC integration. I was looking at porting the little Scheme implementation from my Wasm JIT work to C -- because the rest of whippet is in C -- but is that a good enough reason? I got mired in the swamps of precise GC roots in C and let me tell you, it is disgusting. You don't have RAII like you do in C++, without GCC extensions. You don't have stack maps, like you would like. I might be able to push through but it's the sort of project you need a day for, and those days have not been forthcoming recently.

Anyway, just a little hack log tonight, not detailing a success, but rather a condition. Perhaps observing the position of this logjam will change its velocity. Here's hoping, and until next time, happy hacking!