Saturday, August 22, 2020

The perfect time-frequency transform

 A while back I took a music class. We had to do a final project, for which I researched time-frequency transforms. These take a sound signal in the time domain and produce a 2D intensity graph (spectrogram) of intensity at each time/frequency pair. For example, there is the short-time Fourier transform (STFT) that takes slices of the signal, runs the Fourier transform over each slice to get a frequency pattern, and then averages them together.

The problem with the STFT is its resolution; it requires picking a window size for the sliding slices and this window restricts the frequency information that may be obtained. The right way to handle sampling is to construct an ideal signal using the Whittaker–Shannon interpolation formula, transform that to an ideal time-frequency spectrum, and then compute an average/integral of the area of the spectrum corresponding to each pixel.

So, handling sampling is easy and just requires summing a double integral of the transform of a sinc function a number of times. But what is the actual transform? Reviewing the scientific literature I found many different transforms; the two most interesting were minimal-cross entropy (MCE-TFD) and the tomography time-frequency transform (TTFT). The MCE method tries to minimize the entropy of the spectrum, using an iterative process. The TTFT uses a fractional Fourier transform to get intensities along each frequency/time angle, and then uses the inverse Radon transform to turn these sections into a 2D spectrum.

The TTFT perfectly captures linear chirps; a linear chirp \(\sin((a+ b t) t)\) creates a line on the time-frequency spectrum. But when two or more chirps are present the TTFT shows interference between them. This is due to a quadratic cross-term. The MCE minimizes entropy, not the cross-term, so it too has interference, although less of it. So the question is, can we get rid of the interference? This amounts to the transform being linear; the transform of a weighted sum is the weighted sum of the spectrums.

We also want "locality" or clamping; a waveform that is 0 everywhere but a small time slice should have a spectrum that is 0 everywhere but in that slice. Similarly if the frequencies are limited to a band then the spectrum should also be limited to that band. Additionally we want time-shifting, so that the spectrum of \(f(x+k)\) is the spectrum of \(f\) shifted by \(k\).

So to review the properties:

  • Linearity: \(T(a f + b g) = a T(f)+b T(g) \)
  • Chirps: \(T(\sin((a+b t) t))(t,\omega) = \delta ( \omega - (a+b t) ) \) where \(\delta\) is the Dirac delta
  • Locality: \(T(H (k t) f(t))(t,\omega) = H ( k t) T(f(t))(t,\omega) \) where \(H\) is the step function
  • Bandlimiting: if for all \( \omega \in [a,b] \), \( \hat f (\omega) = 0 \), then \( T (f) (t,\omega) = 0 \) for all \( \omega \in [a,b] \)
  • Shifting: \(T(f(t))(t,\omega) = T(f(t+k)(t+k,\omega)\)

 The question now is whether such a transform exists. It seems like even these conditions are not sufficient to specify the result, because writing the function as a sum of chirps isn't enough to get a converging graph.

Monday, January 13, 2020

Stroscot devlog 4

The parser’s kind of working, besides features like EBNF, scannerless parsing, etc. that will be useful later but aren’t strictly necessary. So now I am writing the middle parts for the Stroscot programming language.

I started out by wanting to implement universes with dependent types, in particular experimenting with the New Foundations idea. I also wanted to add subtyping a la Dolan. Then I remembered insane and that it had to suspend most of the “check” calls in order to check recursive types successfully, so I needed an evaluation model. I wondered if type inference was any more complicated, so I looked at Roy and saw the unify function and concluded that implementing type inference was going to involve adding type variables and type solving (unification) to an existing type checker, but wouldn’t change much about the type checker structure, so starting with a dependent type checker would be best.

Then I looked for dependent type checker tutorials and found this; it starts out by presenting an evaluator for the untyped lambda calculus and then extends it gradually. LambdaPi takes a similar approach. They both use environments and closures, but with optimal reduction there are duplicators rather than closures and the environments are stored in the graph, so the implementation is quite different.

So I figured I’d start with to the optimal lambda calculus implementation in macro lambda calculus. But the first thing it does is a pseudo-alpha-renaming pass that identifies all the free variables and replaces them with wires. The code is complicated quite a bit by the separation into a separate inet-lib library which takes a custom-syntax interaction net description and uses JavaScript’s eval features to implement the interaction rules. I went and read “Optimality and inefficiency” and they show poor behavior on C_n = \x -> (2 (2 (2 (2 x)))) where 2 = \s z -> s(s z), when applied to \x y -> x y and \x y -> y x. I still like optimality though; the issues they present are unrelated to interaction nets and can be worked around with the garbage collection in Asperti/BOHM.

I looked for more implementations of LambdaScope and it seems that Jan Rochel’s graph-rewriting-lambdascope package (Haskell) is the only one. There is MaiaVictor’s optlam / absal (JS) and Vizaxo’s optal (Haskell) but they use Lamping’s algorithm which has brackets/croissants and more overhead. They’re still worth looking at for ideas on how to implement interaction nets though.

I was thinking that perhaps Kernel/Fexprs had an interesting solution to variable lookup, as the environment is passed explicitly, so I was reading through the thesis and got lost. I tried reading the paper and it also seems very long and full of Scheme-isms with not much on lambdas. The primitives are wrap and vau, somewhere in there we can insert optimal evaluation but I didn’t get to environments yet.


Thursday, January 9, 2020

Stroscot Devlog 3

So I've been reading a lot about Earley parsers. They're divided into predictor completer and scanner sections. There's some “interesting observations” in the “Directly Executable Earley Parsing” paper: mainly, that additions are only ever made to the current and next Earley sets. But we still have to store the Earley sets; on the other hand, I noticed that we only have to look at them indexed by production. So we can index that by the next non-terminal and then look back to retrieve them. But it seems clear that the sets are mostly for notational, pedagogical, and debugging purposes, and the algorithm only needs the lists of suspended nonterminals pointed to by in-progress non terminals.

The other thing I looked at was the Earley-LR optimization, basically turning the grammar into an LR(0) automaton with shift/reduce ambiguity resolved by Earley sets, and generally speaking it seems like it can speed up the parser. Each Earley item is normally a dotted production rule together with a parent set, but you can replace the production rule with an LR(0) DFA state, which is a set of LR(0) items.

According to Jeffrey Kegler: “In creating Marpa, I may have acquired as much experience with the AH algorithm as anybody, possible even Aycock/Horspool themselves. I found the special LR(0) items they create to be counter-productive. AH's own implementation apparently had bugs due to duplication of Earley items. I solved these, but it made the code very complex and slower. Most of the compression that the AH method bought was in predictions, which could be compressed in other, less complicated, ways. Using AH items means that whenever something needs to be done in terms of the user's grammar, there has to be a translation. Since the AH item to Earley item relationship is many-to-many, this is not trivial. The cost of the more complex evaluation phase seems likely to eliminate any speed-up in the recognizer.

In “Practical Earley Parsing”, as a side issue, they describe a way of handling nulled symbols and productions via grammar rewrites. This proved very valuable and Marpa *does* continue to use that. It's a bit ironic that I ended up ignoring a lot of AH's paper, but that their little implementation trick has proved central to my efforts.” (edited, see here)

So in summary LR doesn’t add much to Earley parsing, but computing nullability and handling it specially does. It seems like they're optimizing their parser a lot, but they're optimising the wrong things. The memory layout and cache use is the main thing to pay attention to on modern processors and they don't seem to have a very interesting layout. And the radix tree for Earley items seems completely pointless. You have to iterate over all the pending Earley items anyway so you might as well just use a linked list.

The practical paper also leads to this rabbit hole of direct code execution / threaded execution / direct threaded code. It’s useful in parsers, but appears in any interpreter, e.g. Python. Python uses computed gotos and the basic idea is that instead of jumping back to a dispatch loop you include the code which looks up the dispatch table and jumps directly. Python uses the first class labels feature available in GNU C. But reading the “Directly Executable Earley Parsing” paper it just seems like a bunch of pointer stuff and it doesn't do the state splitting that they do in their later papers.

Another question is whether we can implement parser combinators on top of Earley, memoizing all the states somehow. It seems like it might be possible with enough recursion and memoization hacking but probably not worth it.

And I still don't know how to handle extended grammars, and I haven't looked at the Leo paper at all. But from some code skims of people who’ve implemented it Leo seems like it’s just a matter of keeping track of certain items. Extended Earley grammars generalize the dot position in the rule, so it’s an NFA (DFA?) instead of just a linear sequence. This of course should be easily accomplished with the help of some papers I have on extended rules.