Wednesday, January 2, 2019

Project Stroscot

Using a Markov name generator and pasting in Github names, Esolangs and jokes, and Wikipedia names, I came up with the name Stroscot for my next programming language. I have created a Github repo and subreddit for it.

Any design for a language should first take into consideration the fate of ALGOL 68. Years of committee meetings resulted in an incomprehensible and almost unusable specification, defining numerous overspecialized features in exacting detail and often using precise formalism. The goal of a language is to assist the programmer in "the reliable creation of sophisticated programs". Hence, the three tests for a language are: Can you write the program? Can you write the program quickly? Can you write the program correctly?

Declarative programming is basically dead, besides SQL and maybe Prolog; even SQL doesn't look that declarative anymore, it's basically functional programming in disguise, just with a little query planner. Prolog is a bit more interesting but at this point it's basically synonymous with depth-first SLD resolution and everybody who writes Prolog assumes those are the execution semantics; hence, it's just an interesting syntax for imperative programming with some backtracking capabilities thrown in. Anyways, the conclusion is that since "declarative programming" is defined as "anything that's not imperative", and declarative programming is dead, Stroscot will be an imperative programming language.

As far as types go, I'm still not decided. I definitely want to implement a lot of static analysis passes. But maybe types are a decoy. Type inference is basically a limited form of flow analysis and liveness. Gradual types and inferred types are ideas floating around though.

In the medium term, generating reasonably-efficient compiled code is a goal. Doing assembly directly is a mess, although Intel maintains this x86 instruction library in C. Stone knife Forth is a good start along those lines. Potion is similar with a hard-coded JIT. There are also mid-level JIT libraries. Zed Shaw wrote a little prototype EaRing on top of GNU Lightning (slides); GNU Lightning is basically dead though and really crufty so a new one should use libjit or something (recommended here). The most recent thing is probably WebAssembly but I'm not sure how useful targeting that would be. GHC and friends are experimenting with using LLVM; LLVM was designed with C++ in mind though so it's not really flexible enough for assembly. But Kavon Farvardin is working on that.

The other big issue with compilation is memory management; implementing garbage collectors is a pain, and I haven't really researched them. So for now I'll focus on just an interpreter since then I can use the host language's GC. Alternatively this micro virtual machines project looks interesting; at the very least they have a Rust Immix GC implementation. Skipping GC and doing a memory-safe language is also a possibility, but borrow checking seems hard. This guy spent a year on "Gradual Memory Management" and still can barely compile a simple C-like language. Although he's working on a harder problem; Rust can't even do a doubly-linked list without going into unsafe code or adding reference-counting overhead and basically doing GC yourself. Still another possibility is to just use malloc and free and the various specialized allocators like normal C programmers do and then maybe add static or dynamic analysis to detect errors. Tracing is invasive so supporting explicit memory and memory managed systems simultaneously is still mostly an open research problem. But explicit is efficient: see this paper for "Destination Passing Style" memory usage, where their optimizer produces the fastest C code.

Henri Tuhola describes compiling via CPS as "insanely effective", which seems hard to argue with. There's a CPS to SSA pass here. According to this mail, SSA uses a DAG to represent scoping while CPS uses a tree and parameter passing, hence CSE is harder in CPS. Some constructs in CPS can't be translated to SSA though, mostly ones generated by callcc and related functions. The other supposed problems with CPS are that it fixes the evaluation order and that the transformed terms are harder to reason about or use rewrite rules on. This paper says that CPS improves binding time analysis and flow analysis by exposing more static calls. But again CPS doesn't work well with LLVM.

What evaluation order do we want? For Haskell it is call-by-need, for ML and most other languages it is call-by-value. Personally though I am attracted to optimal reduction; it optimizes repeated application to be logarithmic, and also exposes some opportunities for fusion, deforestation, etc. And the underlying representation, interaction nets, can support call-by-name and call-by-value and also call-by-need. The issues seem to be (1) blowup in the graph representation due to redundant fans and duplication (2) handling recursion (e.g. let) (3) no efficient implementation of the 'bookkeeping' oracle which determines whether two fans should cancel or copy (4) graph representation / reduction overhead. I tried hacking on an optimal reduction implementation for a week and my conclusion is that it has a high learning curve and high tool support requirement but that it's workable. Solving the bookkeeping overhead seems like a PhD topic though so an initial prototype could just use closed reduction, it's still better than call-by-need.

1 comment:

  1. Thorin is a graph-based extension of CPS which solves the scoping problem and compiles to LLVM, worth a look.

    ReplyDelete