In search of the ultimate programming: 2019

Sunday, December 22, 2019

Stroscot Devlog 2

Okay so programming language project post 2. Sorry that these are a little slow, I get backlogged...

I was wandering down a misty road and I saw a parser at the end of the tunnel. But did I see this parser’s code? No, I did not. 😞 I saw many parsers, all different! There was a tangled mess of parsers!

Well, I guess what I'm trying to say is that I've been reading lots of parsing papers, and some of them work with each other and some of them don't, and it's not clear which is which. I mean, it looks like they can all be assembled together into a single massive parser, but I don't know. So far what’s winning is the Earley parser. It basically has three stages: the scanner, the predictor, and the completer. They correspond to state transitions in LR parsers so a reduce is the same as a complete and a shift is the same as a scan. There’s no prediction step in an LR parser because it precomputes all the predictions when it processes the grammar into states.

The standard paper on Earley parsing is “Practical Earley Parsing” by Aycock and Horspool. It shows how to merge an LR automaton with an Earley parser, based on work from an earlier 1996 paper “A Faster Earley Parser”. There's also the Marpa paper which some code snippets and improvements on the Aycock paper. Then there are a bunch of LR papers that look vaguely interesting but I haven't read yet. I'm honestly not sure I will.

So an LL parser has the first and follow sets and then an LR parser is shift reduce and it builds up this LR automaton. And the LR automaton is apparently the same for all LR algorithms SLR/LALR/GLR, an LR(0) automaton, because LR(1) automatons with lookahead are too big. And then the Aycock Horpsool paper uses Earley-LR, a.k.a. LRE following the 1996 paper, also built on top of an LR(0) automaton. Meanwhile GLR is inefficient (exponential) and nobody uses it besides people who don’t care about parser speed.

Then there’s the Adaptive LL paper by the ANTLR guy about how to use LL look ahead to predict things. That actually has a very simple structure: it just pretends it's an LL(0) parser except it has a look-ahead function which constructs a deterministic finite automaton and this DFA is what actually predicts the next production. And then it extends the automaton adaptively as it parses the input (which is the complicated part) and apparently it goes pretty fast. But the approach doesn’t handle ambiguous grammars or left recursion; he has a trick in the appendix but it doesn’t handle all cases. I still think I need to read more papers though. I haven’t looked at the “parsing with derivatives” paper which is probably the most promising since it’s the newest approach.

But first I'm going to read this paper “SPPF-style parsers from cubic recognizers”. So there are two main algorithms in the paper: there's the Earley algorithm and the RIGLR. So we'll start with the Earley algorithm. The Earley algorithm, let’s see... so it has packed nodes and non packed nodes. I found a nice blog post on SPPF trees. But the algorithm is still impenetrable.

I guess going section by section is easier than trying to scan back and forth in the paper for clues. So first section is just introduction, trying to extend the Earley algorithm given in textbooks to parse returns spurious derivations. Then they have BRNGLR which is the alternative that they compare with. So section 2 is just definitions, they include created SPPF nodes in the Earley triples so the triples are labeled with grammar symbol / position, parent set index, SPPF node. And so then there's derivation trees and the shared forests are binarized because nodes have out-degree at most 2. Then they have a summary of Earley’s algorithm in one sentence which is so that's the middle of page 57. I think the contributions of this SPPF paper are mostly in the form of how to write an Earley parser (and RIGLR parser) correctly and the amount of explanation of the motivation or reasoning behind it are essentially nil. However it does have a good section on the algorithm that separates the phases of Earley parsing out and it seems to have more attention paid to the order in which the scanner / completer / predictor get executed than other papers which are just like “execute them all until they stop adding items”.

So the other thing in life I’m thinking about was the operational semantics and whether using the optimal reduction graph machine is a good idea. I've been mulling it over and concluded that it can’t hurt; I mean I know you can't implement eager reduction later, but it’s easy to add special-purpose reductions that split up the graph and hard to find those just by thinking about the problem in advance. So yeah in conclusion I think the best thing to implement optimal reduction.

The other thing I was thinking about was scopes. So if we have a statement like “let x equal 3 in y” then, interpreting that statement, it seems like the best approach would be to go to construct a record with x equals 3 as its contents and then y will be passed that record as an implicit environment argument. And so that should let all the names and stuff be resolved appropriately and so then you'll have like read and write scope commands and those commands will end up passing things through and yeah.

So another thing to do is look at the difference between parsing regular expressions with derivatives and parsing grammars with derivatives and also I want to see what the Antimirov derivatives are about because apparently they help with the capturing group stuff and that might be useful for parsing. It would be interesting to have non-capturing productions, that don’t show up in the generated parse tree.

So back to the big picture, parser goals:

Number 1 is getting to parse regular expression type things and EBNF or even boolean grammars on the right hand sides

Number 2 is doing linear time LR(k) grammars

Number 3 is generating SPPF-packed grammars or results

Number 4 is handling recursive grammars cyclic productions and nullable elements

So let's talk about the structure of a parser. A parser is generally something that takes a string of characters and the grammar and produces an output data structure which is in our case is going to be the SPPF format because that's the easiest format to use so far and it’s cubic time for ambiguous grammars. I guess that’s slow but there hopefully isn’t much ambiguity in a grammar.

The first step in parsing is to process the grammar into a usable form and so when we're building a parser generator, this does a lot of pre-processing and probably builds a big finite state machine that handles every character in the grammar. it seems like it is kind of necessary to use for efficiency. And there are some papers on incremental parsers and modifying LR state machines to add or remove grammar rules.

As far as the testing I guess I’ll use a number of different complicated expressions. First are the examples I've gotten from other parser papers of counter-examples that are not supposed to work. The simplest is probably the calculator or arithmetic language, it’s not LR and handling precedence and associativity is hard.

So the first goal is to understand the SPPF format. Parsing is in some sense an assignment problem, we want to assign values to the different tree start-and-stop positions in order to construct a tree. And in Marpa it’s a bit more complicated because we have different start-and-stop positions for each token-Earleme-lexeme thing.

The interesting part is handling recursion. Recursion happens on the left side, on the right, it happens in the middle, and then you can have cyclic rules that refer to each other indirectly or directly so you have indirect recursion. So how do we deal with this? The answer is we have to maintain multiple states. The question is how you maintain this. The Earley parser has seems to have the most efficient representation of ambiguity because it allows sharing token states between multiple different forms whereas GLR uses a more complicated stack and it seems to have a lot of duplication since it only shares prefixes and postfixes. So the Earley parser seems to have a finer grained sharing/tracking algorithm.

So the ambiguity part is interesting as well. Earley parsing handles left recursion really well. It just collapses the left recursion into a single state and then Leo’s optimization from his paper handles right recursion really well and again it collapses everything into a single state. The reason to use Aycock-Horpsool’s LR stuff is that it collapses a lot of the state transitions by parsing LR stuff. The questionable part is how it handles nullable rules. Nullable rules, I was thinking, can also be handled by creating an epsilon-LR automaton and adding some more complicated transition logic, so it's not clear that you have to modify the grammar like Aycock-Horspool does or if it's possible to just construct a special deterministic LR automaton instead that works for the graph.

The next question is how do I use it for operators? The operator precedence graph is also hard or parser to deal with. I found this parsing thesis that disambiguates SPFF trees using mixfix precedence information, and it seems easy enough, so I just have to read it.

The next thing is how to handle layout parsing in lexing and indentation and whitespace and they seem like trivial things since I found some nice papers (and I haven't read them either yet) but they should just be modifications to the grammar with more concatenative operators and extra ways of matching tokens.

I have no idea how this cubic Earley algorithm works. It's just so complicated. It's really intimidating. The start can be ignored; it’s initialization. You can just use a dummy rule with the start symbol on the right hand side and it will initialize itself, and this will also be needed for the Leo optimization to work anyway. This strategy is used in Bison too in its LR parsers.

The next part of the algorithm is all in a loop over the Earley sets; this is the index of the character.

Then there's the set R, which is defined in their earlier 2008 paper on Earley parsing along with Q. They’re both queues of pending items to be processed. Q is the set of items whose next symbol is a terminal and R is everything else. So R is processed first and then Q is processed to move to the next token. And then there is a holding set H for completing null derivations.

The Earley parser algorithm's input is a grammar gamma that is composed of the (let me look it up again) a set N of non-terminal symbols, a set T of terminal symbols, and an element S which is in N that called the start symbol, and a set P of numbered grammar rules of the form A := Alpha where a is a non-terminal symbol and Alpha is a possibly empty string of terminals and non-terminals. And the symbol epsilon denotes the empty string.
And then there is the string a1 a2 etc. up to a n. Okay.
So the actual Earley parser algorithm: first, we initialize e0 through e n to the empty set we also initialize the sets R, Q old, and V to be empty.
Next for all rules in P the set of grammar rules we set find the ones that start with the start symbol S letting their right hand side the alpha and if the right hand side is start with a string or of terminal or start is a terminal or non-terminal that begins with a non terminal... alright, this is at the point where writing it down as code directly is easier.

So I also wanted to talk about types. I really don't like types of cluttering the program and making it run slow but they are useful for type inference. In particular you have constraint solving. So the constraint solving works really well in Haskell, and it lets you write more code automatically, and so I want to expand on that in my language and have at least some amount of constraint solving. In particular you need a coerce function that automatically converts between any two types and then you need to be able to overload methods and have them resolved by type.
The most useful thing I've seen so far is the subtyping algorithm in Brick and MLSub which Stephen Dolan wrote. As far as doing something else I think the only solution is to hack until it works. I’ve read a bunch of type papers and they all end up leaving it up to you make it work. If they do present an algorithm it’s very simple and then all the type papers are mostly about proving the algorithm correct and complete rather than on the implementation.

I did a crawl of old papers citing Earley parsers and I didn't really find anything new. Google only shows the first 1000 results, sorted by citations, and there’s the Leo paper and the Horspool paper but not much else. I downloaded a few just to look at later but there aren't any really novel ideas, just optimizations and benchmarks.

Saturday, May 11, 2019

Stroscot Devlog 1

A programming language has a lot of components: the parser, the operational semantics, the code generator...
For the code generator I think we'll just use LLVM because LLVM is the most well-developed. It can handle the register spilling and some of the other stuff that we need to handle. LLVM is used by Haskell and Manticore, the interesting CPS compiler which I wanted to try, not to mention Zig, Rust, Jai, Julia, Pure. And of course LLVM has clang so it's easy to interface with C and C++ code. So a lot of good reasons to use LLVM and there aren't really many projects competing with it. The issues seem to be that it changes frequently and that the optimizations are kind of hit-or-miss depending on what kind of code you generate (although they can be disabled). If I was going to write an alternative I would use the Intel xed library because I don't want to read all those huge Intel PDFs of instruction tables and stuff, but LLVM is like xed on steroids and with cross-platform compatibility thrown in.

Another part of the project is the support tools, like the documentation and the code highlighting and the pretty printer/source code formatter and the automated refactoring and the hyperlinked source code browser search interface.

That last is implemented at Mozilla with the searchfox interface. Then there’s debian’s code search that uses the library codesearch. But for a small project grep probably suffices. I also hope these devlog notes will get turned into reference material somehow like GHC’s notes and commentary.

For documentation the main functionality I want is to include highlighted snippets of code for elucidating explanation and then just a way to write normal HTML stuff for a web browser. I assume the web is the best way to convey documentation. I don't know about videos; I find they're generally hard to skim so they're best it for introductions. There are 2 minute videos for real introductions and then 10 minute videos which seem more like infomercials than useful video. And then you go to a talk that's an hour-long or something; it's a complete waste of time unless you’re the speaker.

Another big component is the standard library. And of course the standard library ends up being used by the compiler to write itself then so you have to design the library concurrent with the actual compiler at least somewhat. Most of the library writes itself or uses standard algorithms but then there’s intrinsics and primops. Primops are the most interesting part of the of the compiler because those are the stuff that you can't implement but really they are implemented. Perhaps primop is a misleading name because really what they are is calls to external functions written in other languages, so they're really “foreign function calls”, and they’re just called “foreign” because they're not written in the native programming language that you're using right now. At this point, for GHC, primops are just LLVM IR with a special calling convention, so it's probably not necessary to do it the way GHC does.

Now, the best way to use an FFI, that's one thing I've been looking at a lot, because the better the FFI is the easier your language is to use. You just say “Aha! I want to use this PL” and you drop it into your project, you import all your existing files, which is where the work is, depending on how hard your FFI is to use, and then you just write your programs and your new language. That's how Mozilla is translating Firefox from C++ to Rust. They're using the Rust FFI with C bindings. It's not very good, honestly. I think the best thing is to use the LLVM clang interface and interface with C++ directly. But C++ as a language has problems, it’s huge. It has templates and gobbledygook, basically this name mangling for the function overloading, and writing a good C++ FFI you have to match this interface somehow and so I mean the only way to interface with C++ basically is to write your own vtable and name mangling or to use someone else's that's already implemented and of course clang is the obvious choice here.

So what I want to do is to use is the ROOT C++ interpreter Cling to compile snippets of code, and then gradually extend that to building a C++ AST ourselves. But right now I'm not sure how hard it will be, it might be impossible. There are some slides about how Cling builds up a global definition table and then it compiles things over and over but I'm not sure how that works here for ahead of time compilation. I have to look at the code because the slides are no help.

I still need to figure out the architecture for my project folder. I'm not sure if it’s important if you're doing everything with code search and just whizzing around the project with hyperlinks and stuff, but it would be good to figure out which modules can import each other and assign places for new functions that make sense. So it's a classification problem with a DAG requirement attached. Right now I have this docs glue library layout, and it's actually starting to be too much work. So maybe the most pressing priority now is for the documentation to be more navigable than just simple text files so I need to build an HTML linking thing.

I also want to know how to make money with programming language design. Tabled for now since it has no users. I’m interested in Python machine learning and CUDA, the hard part is mixing C++ machine learning and all that in one language. Ideally our FFI will support more than just C++, so that you can use programming languages from one place and just mix them and match them and have this polyglot language that kind of patches things together.

Another option is to fake mixing syntax by using its syntax definition, but figuring out how to compose grammars is a hard problem. There are some papers to read. My basic theory now is that you do it with grammar rules. You just say: I have a grammar for the C language, I have a grammar for the Java language, I'll just take the expression rule from one language and the other language and write a new expression rule that combines the syntax of both. And then you have a little tricky thing to modify the fixed point and yeah. And with scannerless parsing it actually works.

On a digression, changing the fixed point in the code is refactoring. I want refactoring to be possible with just writing a wrapper around a method that redefines its semantics and then you just run a little compiler, like expand this definition and then it goes through and it doesn't interpret most of the semantics but it fuses the language functions enough that you get rid of the old definition then you get this new prettier definition that does what you wanted to do and so yeah. There’s a debate over whether you want to refactor directly, because sometimes you want to do the rewrite directly but other times you want to include the definition somewhere else and you don't want the original modified, i.e. it's clearer to write it as the refactoring itself then to run the refactoring over the code. Like when you're writing a parser incrementally as a tutorial, you might want to start with this naive slow algorithm and then gradually add more complicated and more complex optimization stuff. But you don't want the optimizations to be in the code, you want them to be like a separate method on top of the code.
And so what we need is an optimization database. Kind of like GHC interface files. Interface files are complicated, they store a lot of data, but the data is just computed so you have to do some kind of dependency management system. So eventually you know you want an entire package manager for a single programming language that manages the code in the files but I don't know how that works. Dependency management is complicated except it's not really that hard. There’s a project called Adapton you basically have a graph and you have two phases you have invalidation which propagates up and then you have evaluation which propagates down and there's an adapton paper. The problem is when you get names and recursion and all the stuff and that's when you're like “oh I'll just fuck it all” and you need an entire programming language just to handle incremental computations. But incremental computation is hard so it’s a little complicated to extend it to handle recursion.
Also relevant is Nix, Nix has a programming language kind of and so in Nix we have this derivation thing and the derivation returns a hash of the input files and then it computes stuff and so it stores files in this hashed locations. But hashing doesn't really work, it's too expensive, so instead of hashes you have to use modification times. And the other problem with hashing the way Nix does it is it makes working with existing software really hard because the files are stored in random locations for isolation. The solution there is to use file access times or ptrace or a virtual file system or maybe even some virus scanning API.

Another problem with hashes, for checking for modifications, is that they're slow. So you do want to use modification times but you want to treat them as hashes so like basically pretend the modification time is a hash of the file and you don't compare them with less than you just say “are these equal or not equal?”. And they're cheap. I mean the alternative is to use some kind of filesystem watcher but it turns out that basically just generates custom timestamps in a change list. Timestamps are provided by the operating system and you don't need to use a file system watcher except when you're scanning for file system changes. Basically the watcher just turns a thing that does lots of file system accesses into something that processes the change list. So the only thing to be careful about when writing an algorithm that uses timestamps, for making it extensible to watchers, is that it has to scan first and then process the files into separate things they can't fuse. It has to handle changes sequentially basically. So it can't make an assumption that a file hasn't been changed. And then the timestamp format has to be flexible.

And keywords. yeah I should Implement keyword based programming as well and so that was in with the whole polyglot languages thing is that you just pick a code snippet in any language and you're like I want to use this code snippet and then just use it.

So as well as keyword programming we also could also do aspect oriented, but I feel it’s not necessary and we can do similar things with documentation, because it's easier to store everything in one place and when you're looking for what calls this function, or not that because it requires global semantic search, what calls this function is hard, but when you're looking for what does this function do it's easier to have a big one imperative method then it is to have lots of small stupid ones. That was some code advice in Jai by Jonathan Blow here.

Jai is not open source but he has a ton of YouTube videos where he programs and hacks on it so you can see a lot of the code just because it's on his screen. It's a very standard C++ language it doesn't really have anything new as far as syntax or anything I'm honestly not sure why he's writing it but it's cool to watch.

So you know I mean building and having the documentation be a single big hyperlink mess on the web is probably the best way to deal with it.

In other news I'm reading all these papers and I don't really feel like I'm processing them. I don't know how to process these big papers. I was writing an Earley parser and I was like “this paper is so complicated!” It wants me to implement a DFA and I don't know how they implemented a DFA and so then I looked up a regular expression library redgrep because I knew it implemented a DFA and I was like oh yeah that’s how it works. But I don't know how to use that adapted to a parser and it's all so complex. From Googling it seems there is no solution to a complex project except to document it carefully and work through it step by step, hence these development logs and trying to code step-by-step. It might work.

Maybe we could have a voice-activated programming language because if we combined keyword programming with voice recognition then you can have a programming language that's done by voice. Voice activated programming for the win. I don't know, we'll see. But if I can write most of the code with my voice like I’m writing these devlogs, that would be a good idea. And I was thinking about using English syntax for a lot of things so that's like one of the goals of the grammar parser and stuff is to have English like syntax. I should check to see what parser inform 7 uses because that's the closest thing to natural language parsing.

That reminds me of Zarf and Andrew Plotkin. He had this thing called rule-based programming and he had a page of ideas. The idea was that you could have a list and you would just specify precedence rules and then it would sort the list for you and I don't know that's kind of it I think, just do a topological sort. And then there were the rules and, I don't know, rules are complicated. You have wrappers and defeasible rules and before rules and after rules and rules that trigger under certain conditions. The hard part is figuring out how to combine the rules. So Common Lisp has multi methods and I think they were better written in Dylan, Slate, ... something had a better multi method system. This whole idea of dispatching commands is something that you can play with. Smalltalk is the original but atomo is a programming language that uses this message passing thing. I also need to look it up to see how it handled defining assignment as syntactic sugar.

Then I need to go over my entire blog again looking for the functions and stuff because they aren't in the searchable interface yet. So I’d like a searchable interface where I can just type words and have them show up. Google does an index thing but most of my blog doesn’t seem to be indexed. They’ve been getting worse lately, I think Bing has a better search cache. Building this website that you can search the thing and it has to be fast because fast is good. I need to figure out how to set up a website. That's the first project, set up a website. And oh and then of course the website has to be reproducible so you need a way to just set it up without losing your stuff. So GitHub will not work because I want a search and the search has to run a script and the script is not runnable on GitHub. Or maybe it is, just generate this big JS blob. Or I could find free hostingl I think Heroku has free hosting or something with dynamos and stuff but let's see.

Wednesday, February 6, 2019

Where is Daniel Burd?

Various commenters on blog posts ask "Where is Daniel Burd?" and "What is the state of plastic decomposition?".

I found an obituary that seems relevant. Looking at Genrich's papers (1 2) it seems clear he had a significant influence on the project. Since children generally have different interests from their parents, I would guess that Daniel is enjoying the fruits of his science fair winnings but has not pursued it further. The lack of internet presence is typical; most people generate surprisingly little footprint.

There are other similar science fair projects, it's not particularly hard as a research project.

I would guess the problem is that the bacteria isolated are not very fast or effective, hence decomposition using them is not feasible. Academic research seems to be focused on isolating and improving the enzymes the bacteria use. And of course for industry scaling up requires capital and expertise, but at the moment landfills and recycling are sufficient so there is not much financial incentive for commercialization yet.

Wednesday, January 30, 2019

Name resolution

The part of Spoofax I haven't really looked at yet but sounds cool is their name binding language. A (partial) implementation of semantics using declarative rules! But Stroscot is imperative, so the interesting part is the name resolution algorithm. We have "scopes", which have a parent-child relationship where lookup goes to the parent if it's not in the child.

In this example, we have scopes 1,2, and 3, and then use sites with outgoing edges and declarations with incoming edges. There are also imports and exports. Then there's a complicated tie-breaking graph-walking algorithm which determines which use maps to which definition. I don't know, I expected more. There's another paper, they point out that name resolution and type checking are often co-dependent. It seems to amount to even more complex graph walking.

Haskell doesn't have TDNR so it has a separate "renaming" pass before type-checking.

Wednesday, January 23, 2019

More on parsing for programming languages

Marpa, although nice on first glance (parser shootout w/ Marpa in the lead), from a maintainability/forkability perspective is a nightmare. All the code is written by one guy, hence the bus factor is 1. The only real documentation of the code is this handwavy paper focused on the algorithm's complexity, the API overview, and then the literate-programming book of code. The literate stuff seems like it's necessary to deal with the syntactic overhead of writing C code, but if you're writing a new programming language that stuff isn't relevant and it turns a 10-line algorithm into a 30-page nightmare. There's an attempt at re-implementing it in C# but unfortunately I think it's nowhere near Marpa in terms of efficiency or features since the author isn't well-versed in parsing theory. The Marpa guy honestly doesn't have much background either but I guess he's old enough to have been in university when these algorithms were the hot research topic and luckily some of it stuck.

One interesting feature of Marpa that isn't in the C code is its two-level grammar support ("scanless interface" or SLIF). I'm not sure how it works since I didn't look at the Perl but there seems to be something called an earleme and these get processed for each character position. And then two Marpa-C grammars. At the surface level though you just have two types of rules, lexical rules and grammar rules, both CFG's, and some extra lexical rules about discarding whitespace and finding the "longest acceptable token match".

The Spoofax Workbench implements a similar two-level "scannerless" thing, they use GLR though. I didn't look at the (Java-ish) code at all but conceptually at least they have an explicit "whitespace expansion" step which inserts optional lexical whitespace productions between every grammar rule. Then greedy matching is implemented by means of follow restrictions. They also have keyword priorities so that reserved words generate syntax errors when they are used, this seems like a misfeature though. Why reject a perfectly non-ambiguous program just because of a stupid language specification? In general just browsing the documentation it seems like GLR parsing leads to mysterious parse errors when trying to disambiguate using priority or associativity. Marpa hasn't seen much real-world use so it could be a problem there too, I don't know.

Another interesting point is the rule alternative syntax. Marpa models itself after EBNF with the pipes, and then has actions after each defining how to construct the tree, like "A := B { B a } | C { C a }. Spoofax uses separate rule alternative lines for each, like "A.B = B\nA.C = C". The action syntax closely models yacc/bison/etc. and their philosophy, while the Spoofax approach gives constructor names and hence an automatic ATerm ("Annotated Term") format representation to the grammar. For comparison DMS just uses "A = B\nA = C" and then uses "token numbers" (which seem to act roughly like grammar source line numbers but less useful) to specify which production is used in their parser output. This is the least overhead writing-wise for grammars but seems unmaintainable. A compromise might be numbering the constructors, A.1 A.2 etc., which is implicit like DMS but more closely maps to code structure instead of surface syntax.

DMS just uses a 'real' regexp-based lexer, I guess it's alright but what I really want is composable/scopable lexing so that you can embed arbitrary fragments of one language inside another language. Which doesn't really work if you have quotes and escapes etc. that vary from language to language. This paper on "local lexing" seems like the right idea, maybe impossible to implement well though. At least there's actual code.

Watching Parr on ANTLR it seems pretty clear you need a generalized tool. Grammar ambiguity isn't obvious in practice (undecidable), I guess you could try statically checking for it but honestly letting it be ambiguous and fixing the common cases seems like the right approach. And putting in parentheses isn't that expensive for readability while keeping track of hundreds of disambiguation rules is. Even doing some ML approach like predicting which rule is more useful is probably going to break in practice. The rest of the talk is mostly Parr spreading FUD and bashing everything besides ANTLR. He does bring up another idea though, which is how to process the parse tree afterwards. He listed OMeta, TXL, Stratego, TOM/Java, Rascal, and his own XPath-based ANTLR visitor-pattern thing as possibilities/inspiration, I'll research those further.

The most promising approach I missed previously was parsing with derivatives; it's hard to tell what the algorithm does though, since it modifies the grammar rather than a state machine.

Wednesday, January 16, 2019

The end

The nature of death has always been a subject of debate. One might compare it to sleep, another poorly understood phenomenon. On the one hand people die all the time, so it should be well-understood. On the other hand controlling the process of death (or indeed that of sleep) is extremely difficult. Hence the lack of understanding.

Even defining death is a difficult task. The subjective experience of death is by definition not possible to retrieve. The closest we get is near-death experiences, which are pretty much indistinguishable from comas or sleeping.

Another problem is discussing death; death anxiety means that discussions or even contemplation of the subject will quickly lead to dropping the subject. Although perhaps this is because there just isn't much to say.

Wednesday, January 9, 2019

PL Design process

Designing a programming language without some idea of what it will be used for is a recipe for failure. The first observation is that programming languages tend to end up with particular patterns or 'styles' of coding and these features get used heavily. For example in Haskell, they have some giant record for keeping track of state, some ADT-tree that gets manipulated, and then various glue code in between. TCL tends towards small programs and throwaway scripts, when bash isn't powerful enough and you need complex loop logic or regular expressions. Agda, Coq, and other proof assistants are so complex to write that they are only written with their accompanying IDE. C++ tends towards giant methods ensconced in a few classes, plus a lot of small copy-paste classes for data storage and interface munging. etc. Honestly though my sample size is small so evaluating this is hard.

Another idea was the schlep filter: writing a compiler is tedious and uninteresting. Hence any language gets a small boost once it's implemented, as other people will use it instead of writing their own. We can imagine a conversion funnel: name recognition -> basic syntax -> toy program -> side project -> startup -> active language developer. Designing a simple website for the PL and measuring conversion rates over time (even if/while it's not implemented) might be useful.

The uninteresting part isn't quite true, by my estimates a new PL gets posted to HN around every 3 months or so. Perhaps information/attention is the most valuable commodity; nobody knows what PL's are out there or what's suited to their project. They might know the library or tool they need though, hence that fixes their choice of PL.

PL design seems to be mostly an art, similar to painting or writing. Since notation is a tool of thought a well-designed PL ends up reshaping the view of the programmer; thoughts that are hard to express are avoided while thoughts that are easy are reinforced. We measure the performance of a PL as in the last post, possibility first then speed and correctness.

Hence I think the best start is a programming examples collection, one might almost say a test suite, where we use and/or implement various abstractions and algorithms. In literate programming the documentation comes with code; here we want the code to be the documentation. It's not so much self-documenting as the knowledge, the meaning, comes from experience with other PL's. For example map is a common function in Haskell but in other languages like C one never sees it as writing it is really hard. And once we write down enough algorithms maybe we can combine them together to form a compiler. So examples should focus on a few beginner coder examples but then mostly compiler algorithms. And then as the language progresses the collection will go from being interesting data to executable test suites. Another good idea is Stone Forth, where any abstraction used must also have its implementation expressed. But this might bog things down, hence why everything starts as a sketch rather than an implementation. We want to design a language starting from the things we want to write and not based on what can be implemented easily, rather that it's implementable at all. VPRI demonstrated with their Nile project that programming language design can make some problems very simple. But Nile itself is pretty big so it's not clear that they actually end up with a reduction in complexity.

Wednesday, January 2, 2019

Project Stroscot

Using a Markov name generator and pasting in Github names, Esolangs and jokes, and Wikipedia names, I came up with the name Stroscot for my next programming language. I have created a Github repo and subreddit for it.

Any design for a language should first take into consideration the fate of ALGOL 68. Years of committee meetings resulted in an incomprehensible and almost unusable specification, defining numerous overspecialized features in exacting detail and often using precise formalism. The goal of a language is to assist the programmer in "the reliable creation of sophisticated programs". Hence, the three tests for a language are: Can you write the program? Can you write the program quickly? Can you write the program correctly?

Declarative programming is basically dead, besides SQL and maybe Prolog; even SQL doesn't look that declarative anymore, it's basically functional programming in disguise, just with a little query planner. Prolog is a bit more interesting but at this point it's basically synonymous with depth-first SLD resolution and everybody who writes Prolog assumes those are the execution semantics; hence, it's just an interesting syntax for imperative programming with some backtracking capabilities thrown in. Anyways, the conclusion is that since "declarative programming" is defined as "anything that's not imperative", and declarative programming is dead, Stroscot will be an imperative programming language.

As far as types go, I'm still not decided. I definitely want to implement a lot of static analysis passes. But maybe types are a decoy. Type inference is basically a limited form of flow analysis and liveness. Gradual types and inferred types are ideas floating around though.

In the medium term, generating reasonably-efficient compiled code is a goal. Doing assembly directly is a mess, although Intel maintains this x86 instruction library in C. Stone knife Forth is a good start along those lines. Potion is similar with a hard-coded JIT. There are also mid-level JIT libraries. Zed Shaw wrote a little prototype EaRing on top of GNU Lightning (slides); GNU Lightning is basically dead though and really crufty so a new one should use libjit or something (recommended here). The most recent thing is probably WebAssembly but I'm not sure how useful targeting that would be. GHC and friends are experimenting with using LLVM; LLVM was designed with C++ in mind though so it's not really flexible enough for assembly. But Kavon Farvardin is working on that.

The other big issue with compilation is memory management; implementing garbage collectors is a pain, and I haven't really researched them. So for now I'll focus on just an interpreter since then I can use the host language's GC. Alternatively this micro virtual machines project looks interesting; at the very least they have a Rust Immix GC implementation. Skipping GC and doing a memory-safe language is also a possibility, but borrow checking seems hard. This guy spent a year on "Gradual Memory Management" and still can barely compile a simple C-like language. Although he's working on a harder problem; Rust can't even do a doubly-linked list without going into unsafe code or adding reference-counting overhead and basically doing GC yourself. Still another possibility is to just use malloc and free and the various specialized allocators like normal C programmers do and then maybe add static or dynamic analysis to detect errors. Tracing is invasive so supporting explicit memory and memory managed systems simultaneously is still mostly an open research problem. But explicit is efficient: see this paper for "Destination Passing Style" memory usage, where their optimizer produces the fastest C code.

Henri Tuhola describes compiling via CPS as "insanely effective", which seems hard to argue with. There's a CPS to SSA pass here. According to this mail, SSA uses a DAG to represent scoping while CPS uses a tree and parameter passing, hence CSE is harder in CPS. Some constructs in CPS can't be translated to SSA though, mostly ones generated by callcc and related functions. The other supposed problems with CPS are that it fixes the evaluation order and that the transformed terms are harder to reason about or use rewrite rules on. This paper says that CPS improves binding time analysis and flow analysis by exposing more static calls. But again CPS doesn't work well with LLVM.

What evaluation order do we want? For Haskell it is call-by-need, for ML and most other languages it is call-by-value. Personally though I am attracted to optimal reduction; it optimizes repeated application to be logarithmic, and also exposes some opportunities for fusion, deforestation, etc. And the underlying representation, interaction nets, can support call-by-name and call-by-value and also call-by-need. The issues seem to be (1) blowup in the graph representation due to redundant fans and duplication (2) handling recursion (e.g. let) (3) no efficient implementation of the 'bookkeeping' oracle which determines whether two fans should cancel or copy (4) graph representation / reduction overhead. I tried hacking on an optimal reduction implementation for a week and my conclusion is that it has a high learning curve and high tool support requirement but that it's workable. Solving the bookkeeping overhead seems like a PhD topic though so an initial prototype could just use closed reduction, it's still better than call-by-need.