spectre and the end of langsec
I remember in 2008 seeing Gerald Sussman, creator of the Scheme language, resignedly describing a sea change in the MIT computer science curriculum. In response to a question from the audience, he said:
The work of engineers used to be about taking small parts that they understood entirely and using simple techniques to compose them into larger things that do what they want.
But programming now isn't so much like that. Nowadays you muck around with incomprehensible or nonexistent man pages for software you don't know who wrote. You have to do basic science on your libraries to see how they work, trying out different inputs and seeing how the code reacts. This is a fundamentally different job.
Like many I was profoundly saddened by this analysis. I want to believe in constructive correctness, in math and in proofs. And so with the rise of functional programming, I thought that this historical slide from reason towards observation was just that, historical, and that the "safe" languages had a compelling value that would be evident eventually: that "another world is possible".
In particular I found solace in "langsec", an approach to assessing and ensuring system security in terms of constructively correct programs. One obvious application is parsing of untrusted input, and indeed the langsec.org website appears to emphasize this domain as one in which a programming languages approach can be fruitful. It is, after all, a truth universally acknowledged, that a program with good use of data types, will be free from many common bugs. So far so good, and so far so successful.
The basis of language security is starting from a programming language with a well-defined, easy-to-understand semantics. From there you can prove (formally or informally) interesting security properties about particular programs. For example, if a program has a secret k, but some untrusted subcomponent C of it should not have access to k, one can prove if k can or cannot leak to C. This approach is taken, for example, by Google's Caja compiler to isolate components from each other, even when they run in the context of the same web page.
What's worse, we need to do basic science to come up with adequate mitigations to the Spectre vulnerabilities (side-channel exfiltration of results of speculative execution). Retpolines, poisons and masks, et cetera: none of these are proven to work. They are simply observed to be effective on current hardware. Indeed mitigations are anathema to the correctness-by-construction: if you can prove that a problem doesn't exist, what is there to mitigate?
Spectre is not the first crack in the edifice of practical program correctness. In particular, timing side channels are rarely captured in language semantics. But I think it's fair to say that Spectre is the most devastating vulnerability in the langsec approach to security that has ever been uncovered.
Where do we go from here? I see but two options. One is to attempt to make the behavior of the machines targetted by secure language implementations behave rigorously as architecturally specified, and in no other way. This is the approach taken by all of the deployed mitigations (retpolines, poisoned pointers, masked accesses): modify the compiler and runtime to prevent the CPU from speculating through vulnerable indirect branches (prevent speculative execution), or from using fetched values in further speculative fetches (prevent this particular side channel). I think we are missing a model and a proof that these mitigations restore target architectural semantics, though.
However if we did have a model of what a CPU does, we have another opportunity, which is to incorporate that model in a semantics of the target language of a compiler (e.g. micro-x86 versus x86). It could be that this model produces a co-evolution of the target architectures as well, whereby Intel decides to disclose and expose more of its microarchitecture to user code. Cacheing and other microarchitectural side-effects would then become explicit rather than transparent.
Rich Hickey has this thing where he talks about "simple versus easy". Both of them sound good but for him, only "simple" is good whereas "easy" is bad. It's the sort of subjective distinction that can lead to an endless string of Worse Is Better Is Worse Bourbaki papers, according to the perspective of the author. Anyway transparent caching in the CPU has been marvelously easy for most application developers and fantastically beneficial from a performance perspective. People needing constant-time operations have complained, of course, but that kind of person always complains. Could it be, though, that actually there is some other, better-is-better kind of simplicity that should replace the all-pervasive, now-treacherous transparent cacheing?
I don't know. All I will say is that an ad-hoc approach to determining which branches and loads are safe and which are not is not a plan that inspires confidence. Godspeed to the langsec faithful in these dark times.
Comments are closed.