Tuesday, October 31, 2017

Type system features

Subtyping and open data types

One interesting feature of type systems is extensibility; we have the typical object-oriented inheritance, but this is actually too narrow for a proper type system. We instead have the notion of an extended conversion, a function from -> to. Useful examples start from integer conversions, but the most interesting example is adding more constructors to a datatype; A | B is a subtype of A | B | C. We can also do it with product types, by removing elements; (A,B,C) is a subtype of (A,B) is a subtype of A. It makes more sense with labels and records though; {foo: A, bar: B} is a subtype of {foo: A}. This is the closest it gets to inheritance in a type system. Implementations are few, unfortunately; it's primarily Stephen Dolan's work on BRICK (blog archived) and MLSub. BRICK is notable for supporting equirecursive and regular types, so that records do not need a newtype to destructure or bind, similar to dynamically typed languages such as Python, and for compiling to LLVM (no GC, unfortunately). MLSub is just a boring extension with lots of proofs of correctness in the thesis and the coined terms of "biunification" and "bisubstitution". He basically tries to convince you that his subtyping algorithm is almost as simple to implement as Hindley-Milner; which honestly I agree with.

The most interesting part of subtyping is that it explicitly exposes the idea of polarization or duality; for example a function arrow (A -> B) is contravariant in A and covariant in B. When we type check we can divide the type variables into two classes: the pos-variables (lower bounds) and the neg-variables (upper bounds). In fact we usually need both; for mutable variables, arrays, etc. we need both in and out parameters (what can be stored vs what can be retrieved). For example the parameter type for a function that accepts a mutable array of Ints and only reads from it will be MutArray[store=\bot, retrieve=Int], where the store is a lower bound and retrieve is an upper bound.

Dependent types

Once you have subtyping, what point is there in distinguishing values and types? Letting them be used in place almost interchangeably requires some complicated compilation tricks, and an interactive interpreter, but Idris has shown that it works reasonably well in a practical programming language. But dependent types can be extended further.

Insanely dependent types let you define dependent pairs (Σ A (B a), Idris term a ** B a) using a dependent product (Π, (a : A) -> B a). There's some trickery involved in the implementation, see the source for details. The short answer is you need extra scope to reference all the values, and a careful evaluation strategy to avoid infinite loops. The related idea of circular type signatures allows indexing datatypes by values of the same datatype; for example a matrix indexed by a vector, while defining a vector as a single-dimension matrix of length n. (Their notion of matrix is n-dimensional so really it's a tensor). Also interesting from that reddit thread is the idea of making values (terms) a subtype/instance of their type, so for example 1 is a subtype of Integer.

Richard Eisenberg is working on adding them to Haskell, although only the normal form of dependent types.

Linear Types

We could borrow type constructors from Girard's logic: ! and ?, respectively weakening a type to be usable more than once and to not be used at all. There remains the linear arrow ⊸. The usual intuitionistic arrow A -> B then desugars as !A ⊸ B. But this cannot be the final answer, because in IL we decompose A -> B as ¬(A & ¬B); functions do not really exist. Following this paper it seems like it makes more sense to just add multiplicities to the constructors, 0 1 and ω for erasable, linear, and ordinary types respectively. A note here mentions extending ω to having multiple types.

We could also use monads or their more general cousins of Arrows, but why?

Tweag says they're adding them to GHC. although it's hard to say.

Higher-order Types

This paper seems interesting and useful, shouldn't be too hard to add.

Extended type classes

Haskell's type classes have been wildly successful; from an implementation perspective, they amount to adding a constraint-solving engine that discovers and writes in new terms. But I think type classes should be able to add new type information too; unfortunately my efforts in that direction have been unsuccessful so far. It seems like a simple idea though; the set of type class instances in scope is fixed, so if only one instance unifies with the types in scope then it must be the instance to use (otherwise the term must be resolved elsewhere). This is to be distinguished from GHC's current implementation, which has a more restrictive condition requiring matching instead of unification, where matching is defined to add no new information. The question is how to propagate the information from unification in the constraint solver back to the type checker at large; I don't really know enough GHC internals to diagnose the errors in the test suite I ran into.

Type-directed name resolution

One of the most annoying parts of Haskell is that record constructor names must be globally unique since they count as functions. This has been worked on here, but the problem also shows up with type classes (particularly RebindableSyntax) and just in general with similarly-named or similarly-performing functions. What would be ideal IMO is if Haskell supported ad-hoc overloading, so you could just use the name as many times as you wanted and the compiler disambiguated by type. This would also get rid of the 'import Data.Map as M, Data.Set as S' boilerplate. But this would require intermingling the renaming and type-checking phases of GHC, so unlikely to happen.

No comments:

Post a Comment