Friday, November 3, 2017

Continuations and type systems

How do we get a programming language which combines all the type features we have seen? Linear types, dependent types, subtypes, continuations rather than functions, jumbo types,and so on?

First we must recall the purpose of a type system: at least at a low level, types are calling conventions. The type of a function, module, etc. is its interface, its API. A function may have multiple types and hence multiple calling conventions; but at runtime presumably only a few of these will be used. For efficiency reasons it makes sense to have only one true type and construct all the other types as wrappers, but this is by no means necessary.

When we have a C type, for example int main(int argc, char **argv), this is actually a very strong type; the C standard and implementation specify many details of memory layout, stack usage, etc. that are assumed to hold. The interface with the kernel is a cleaner type; execve simply loads the proper memory layout and jumps to the starting address. To represent this API we must use continuations instead of functions, as they never return a value. After looking at IL we might denote this as the type ¬kernel. Expanding the type kernel out it might look more like ¬(stack pointer -> {argc, argv, env}), at least on Linux. Continuations map nicely to the power of assembly language, where jumping to an unknown code segment gives up all control. If a C 'function' calls exit() it will never return. In IL, there is also a 0 type, the type of 'enterable locations'. From the rules the only values of type 0 look like (λ x.F) T, where F is another value of type 0, and the only values of type ¬T look like λ x.(E1 E2). It's not clear what this is useful for.

At a high level, though, the idea of contracts remains. We have properties of our programs that we wish to remain true, like memory safety. Linear types are one method of enforcing these; we reject programs that don't use resources linearly. Dependent types are another form of contract, allowing us to write pre- and post- conditions on values.

Following the IL paper, and its related cousin TACC, we can define types of continuations - roughly, the type of a continuation is the argument layout it expects to have. We extend our data types to the Jumbo Connectives of arbitrary records and sum types (the Pi type can be ignored, continuations are more powerful than functions as an intermediate language). And also we have to generalize negation to delimited continuations, following Zeilberger and a more recent paper. We can implement these on LLVM in two ways: first, by translating them away into standard function calls, and second by using a special calling convention, jump-with-argument.

How do we combine linear types with continuations? Following this paper we might introduce multiple types of continuations, e.g. the paper's lowered negation ↓¬ and boxed negation . Then linear function types translate as !¬(τ1 & ↓¬τ2) while normal function types translate as !¬(τ1 & !¬τ2). But this hides the Rig (semiring) structure of the context. Maybe instead we could use a numbered continuation, , where k is from our Rig. Then adding dependent types is easy, and subtypes are just polarizing all the information flow. Simple!

Papers by Appel (1, 2 , 3, 4) describe a closure strategy analysis, wherein known functions are compiled using registers but unknown functions use the heap. They do not use the stack because it inefficiently holds on to values even after they are not being used; deallocation would require making holes in the stack. But of course nothing in life is free; as soon as one gives up the stack, one needs it again for interfacing with other (C) code.

Thursday, November 2, 2017

Incremental build farms

Build farms are expensive to setup and maintain. Or are they? Amazon gives away free instance-years, and services such as Travis and Appveyor build open-source projects for free. Can these services be used to bootstrap a Linux distribution?

The Nix answer to these questions is to use a binary cache - a server in the middle that stores cryptographically-signed binaries. It can start as simple as a single node, with a Nix store, then expand to use Amazon S3 or another large-scale filesystem.
But why start there? We could instead start with a distributed filesystem, like Bittorrent's DHT or IPFS. Then sharing files is as easy as sharing the root location hash, and all we need is a simple GitHub Pages site that gives out the root hash for everyone to synchronize on.

The issue of trust still remains though. Who builds what? How does the hash get calculated? Is there a cryptocoin (filecoin)?

Another approach starts from a single moderately-powerful system; Gentoo is a source-based Linux distribution that can usually be compiled and installed overnight. Speeding up Gentoo (re)compilations is a worthy goal. But then how do we go from one system to a network of systems sharing binaries? Gentoo (portage) actually supports binary packages, but in practice this seems like it is never used publicly; instead it has a centralized system of mirrors for distfiles. This is probably due to culture and limitations in compatibility of binary packages; it seems like Gentoo is usually set up to compile packages for the specific CPU it's running on, as a side-effect of too many USE flags ("Gentoo is for ricers"). But doing some sane defaults like Arch shouldn't be too hard, and proper dependency management should alleviate all the compatibility / configurability concerns.