Monday, January 13, 2020

Stroscot devlog 4

The parser’s kind of working, besides features like EBNF, scannerless parsing, etc. that will be useful later but aren’t strictly necessary. So now I am writing the middle parts for the Stroscot programming language.

I started out by wanting to implement universes with dependent types, in particular experimenting with the New Foundations idea. I also wanted to add subtyping a la Dolan. Then I remembered insane and that it had to suspend most of the “check” calls in order to check recursive types successfully, so I needed an evaluation model. I wondered if type inference was any more complicated, so I looked at Roy and saw the unify function and concluded that implementing type inference was going to involve adding type variables and type solving (unification) to an existing type checker, but wouldn’t change much about the type checker structure, so starting with a dependent type checker would be best.

Then I looked for dependent type checker tutorials and found this; it starts out by presenting an evaluator for the untyped lambda calculus and then extends it gradually. LambdaPi takes a similar approach. They both use environments and closures, but with optimal reduction there are duplicators rather than closures and the environments are stored in the graph, so the implementation is quite different.

So I figured I’d start with to the optimal lambda calculus implementation in macro lambda calculus. But the first thing it does is a pseudo-alpha-renaming pass that identifies all the free variables and replaces them with wires. The code is complicated quite a bit by the separation into a separate inet-lib library which takes a custom-syntax interaction net description and uses JavaScript’s eval features to implement the interaction rules. I went and read “Optimality and inefficiency” and they show poor behavior on C_n = \x -> (2 (2 (2 (2 x)))) where 2 = \s z -> s(s z), when applied to \x y -> x y and \x y -> y x. I still like optimality though; the issues they present are unrelated to interaction nets and can be worked around with the garbage collection in Asperti/BOHM.

I looked for more implementations of LambdaScope and it seems that Jan Rochel’s graph-rewriting-lambdascope package (Haskell) is the only one. There is MaiaVictor’s optlam / absal (JS) and Vizaxo’s optal (Haskell) but they use Lamping’s algorithm which has brackets/croissants and more overhead. They’re still worth looking at for ideas on how to implement interaction nets though.

I was thinking that perhaps Kernel/Fexprs had an interesting solution to variable lookup, as the environment is passed explicitly, so I was reading through the thesis and got lost. I tried reading the paper and it also seems very long and full of Scheme-isms with not much on lambdas. The primitives are wrap and vau, somewhere in there we can insert optimal evaluation but I didn’t get to environments yet.


Thursday, January 9, 2020

Stroscot Devlog 3

So I've been reading a lot about Earley parsers. They're divided into predictor completer and scanner sections. There's some “interesting observations” in the “Directly Executable Earley Parsing” paper: mainly, that additions are only ever made to the current and next Earley sets. But we still have to store the Earley sets; on the other hand, I noticed that we only have to look at them indexed by production. So we can index that by the next non-terminal and then look back to retrieve them. But it seems clear that the sets are mostly for notational, pedagogical, and debugging purposes, and the algorithm only needs the lists of suspended nonterminals pointed to by in-progress non terminals.

The other thing I looked at was the Earley-LR optimization, basically turning the grammar into an LR(0) automaton with shift/reduce ambiguity resolved by Earley sets, and generally speaking it seems like it can speed up the parser. Each Earley item is normally a dotted production rule together with a parent set, but you can replace the production rule with an LR(0) DFA state, which is a set of LR(0) items.

According to Jeffrey Kegler: “In creating Marpa, I may have acquired as much experience with the AH algorithm as anybody, possible even Aycock/Horspool themselves. I found the special LR(0) items they create to be counter-productive. AH's own implementation apparently had bugs due to duplication of Earley items. I solved these, but it made the code very complex and slower. Most of the compression that the AH method bought was in predictions, which could be compressed in other, less complicated, ways. Using AH items means that whenever something needs to be done in terms of the user's grammar, there has to be a translation. Since the AH item to Earley item relationship is many-to-many, this is not trivial. The cost of the more complex evaluation phase seems likely to eliminate any speed-up in the recognizer.

In “Practical Earley Parsing”, as a side issue, they describe a way of handling nulled symbols and productions via grammar rewrites. This proved very valuable and Marpa *does* continue to use that. It's a bit ironic that I ended up ignoring a lot of AH's paper, but that their little implementation trick has proved central to my efforts.” (edited, see here)

So in summary LR doesn’t add much to Earley parsing, but computing nullability and handling it specially does. It seems like they're optimizing their parser a lot, but they're optimising the wrong things. The memory layout and cache use is the main thing to pay attention to on modern processors and they don't seem to have a very interesting layout. And the radix tree for Earley items seems completely pointless. You have to iterate over all the pending Earley items anyway so you might as well just use a linked list.

The practical paper also leads to this rabbit hole of direct code execution / threaded execution / direct threaded code. It’s useful in parsers, but appears in any interpreter, e.g. Python. Python uses computed gotos and the basic idea is that instead of jumping back to a dispatch loop you include the code which looks up the dispatch table and jumps directly. Python uses the first class labels feature available in GNU C. But reading the “Directly Executable Earley Parsing” paper it just seems like a bunch of pointer stuff and it doesn't do the state splitting that they do in their later papers.

Another question is whether we can implement parser combinators on top of Earley, memoizing all the states somehow. It seems like it might be possible with enough recursion and memoization hacking but probably not worth it.

And I still don't know how to handle extended grammars, and I haven't looked at the Leo paper at all. But from some code skims of people who’ve implemented it Leo seems like it’s just a matter of keeping track of certain items. Extended Earley grammars generalize the dot position in the rule, so it’s an NFA (DFA?) instead of just a linear sequence. This of course should be easily accomplished with the help of some papers I have on extended rules.