Verifying Data Structures in Haskell

Posted on April 23, 2017

Tags: Haskell, Dependent Types, Data Structures

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE RebindableSyntax #-}

{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

module VerifiedDataStructures where

import Data.Kind hiding (type (*))
import Data.Type.Equality
import Unsafe.Coerce
import GHC.TypeLits hiding (type (<=))
import Data.Proxy
import Data.Coerce
import Prelude

A while ago I read this post on reddit (by David Feuer), about sorting traversables (which was a follow-up on this post by Will Fancher), and I was inspired to write some pseudo-dependently-typed Haskell. The post (and subsequent library) detailed how to use size-indexed heaps to perform fast, total sorting on any traversable. I ended up with a library which has five size-indexed heaps (Braun, pairing, binomial, skew, and leftist), each verified for structural correctness. I also included the non-indexed implementations of each for comparison (as well as benchmarks, tests, and all that good stuff).

The purpose of this post is to go through some of the tricks I used and problems I encountered writing a lot of type-level code in modern Haskell.

Type-Level Numbers in Haskell

In order to index things by their size, we’ll need a type-level representation of size. We’ll use Peano numbers for now:

data Peano = Z | S Peano

Z stands for zero, and S for successor. The terseness is pretty necessary here, unfortunately: arithmetic becomes unreadable otherwise. The simplicity of this definition is useful for proofs and manipulation; however any runtime representation of these numbers is going to be woefully slow.

With the DataKinds extension, the above is automatically promoted to the type-level, so we can write type-level functions (type families) on the Peano type:

type family Plus (n :: Peano) (m :: Peano) :: Peano where
        Plus Z m = m
        Plus (S n) m = S (Plus n m)

Here the TypeFamilies extension is needed. I’ll try and mention every extension I’m using as we go, but I might forget a few, so check the repository for all of the examples (quick aside: I did manage to avoid using UndecidableInstances, but more on that later). One pragma that’s worth mentioning is:

{-# OPTIONS_GHC -fno-warn-unticked-promoted-constructors #-}

This suppresses warnings on the definition of Plus above. Without it, GHC would want us to write:

type family Plus (n :: Peano) (m :: Peano) :: Peano where
        Plus 'Z m = m
        Plus ('S n) m = 'S (Plus n m)

I think that looks pretty ugly, and it can get much worse with more involved arithmetic. The only thing I have found the warnings useful for is []: the type-level empty list gives an error in its unticked form.

Using the Type-Level Numbers with a Pairing Heap

In the original post, a pairing heap (Fredman et al. 1986) was used, for its simplicity and performance. The implementation looked like this:

data Heap n a where
  E :: Heap Z a
  T :: a -> HVec n a -> Heap (S n) a

data HVec n a where
  HNil :: HVec Z a
  HCons :: Heap m a -> HVec n a -> HVec (Plus m n) a

You immediately run into trouble when you try to define merge:

merge :: Ord a => Heap m a -> Heap n a -> Heap (Plus m n) a
merge E ys = ys
merge xs E = xs
merge h1@(T x xs) h2@(T y ys)
  | x <= y = T x (HCons h2 xs)
  | otherwise = T y (HCons h1 ys)

Three errors show up here, but we’ll look at the first one:

Could not deduce (m ~ (Plus m Z))

GHC doesn’t know that $x = x + 0$ . Somehow, we’ll have to prove that it does.

Singletons

In a language with true dependent types, proving the proposition above is as simple as:

plusZeroNeutral : (n : Nat) -> n + 0 = n
plusZeroNeutral Z = Refl
plusZeroNeutral (S k) = cong (plusZeroNeutral k)

(this example is in Idris)

In Haskell, on the other hand, we can’t do the same: functions on the value-level Peano have no relationship with functions on the type-level Peano. There’s no way to automatically link or promote one to the other.

This is where singletons come in (Eisenberg and Weirich 2012). A singleton is a datatype which mirrors a type-level value exactly, except that it has a type parameter which matches the equivalent value on the type-level. In this way, we can write functions on the value-level which are linked to the type-level. Here’s a potential singleton for Peano:

data Natty n where
    Zy :: Natty Z
    Sy :: Natty n -> Natty (S n)

(we need GADTs for this example)

Now, when we pattern-match on Natty, we get a proof of whatever its type parameter was. Here’s a trivial example:

isZero :: Natty n -> Maybe (n :~: Z)
isZero Zy = Just Refl
isZero (Sy _) = Nothing

When we match on Zy, the only value which n could have been is Z, because the only way to construct Zy is if the type parameter is Z.

Using this technique, the plusZeroNeutral proof looks reasonably similar to the Idris version:

plusZeroNeutral :: Natty n -> Plus n Z :~: n
plusZeroNeutral Zy = Refl
plusZeroNeutral (Sy n) = case plusZeroNeutral n of
    Refl -> Refl

To generalize the singletons a little, we could probably use the singletons library, or we could roll our own:

data family The k :: k -> Type

data instance The Peano n where
    Zy :: The Peano Z
    Sy :: The Peano n -> The Peano (S n)

plusZeroNeutral :: The Peano n -> Plus n Z :~: n
plusZeroNeutral Zy = Refl
plusZeroNeutral (Sy n) = case plusZeroNeutral n of
    Refl -> Refl

The The naming is kind of cute, I think. It makes the signature look almost like the Idris version (the is a function from the Idris standard library). The The type family requires the TypeInType extension, which I’ll talk a little more about later.

Proof Erasure and Totality

There’s an issue with these kinds of proofs: the proof code runs every time it is needed. Since the same value is coming out the other end each time (Refl), this seems wasteful.

In a language like Idris, this problem is avoided by noticing that you’re only using the proof for its type information, and then erasing it at runtime. In Haskell, we can accomplish the same with a rule:

{-# NOINLINE plusZeroNeutral #-}

{-# RULES
"plusZeroNeutral" forall x. plusZeroNeutral x 
  = unsafeCoerce (Refl :: 'Z :~: 'Z)
 #-}

This basically says “if this type-checks, then the proof must exist, and therefore the proof must be valid. So don’t bother running it”. Unfortunately, that’s a little bit of a lie. It’s pretty easy to write a proof which type-checks that isn’t valid:

falseIsTrue :: False :~: True
falseIsTrue = falseIsTrue

We won’t be able to perform computations which rely on this proof in Haskell, though: because the computation will never terminate, the proof will never provide an answer. This means that, while the proof isn’t valid, it is type safe. That is, of course, unless we use our manual proof-erasure technique. The RULES pragma will happily replace it with the unsafeCoerce version, effectively introducing unsoundness into our proofs. The reason that this doesn’t cause a problem for language like Idris is that Idris has a totality checker: you can’t write the above definition (with the totality checker turned on) in Idris.

So what’s the solution? Do we have to suffer through the slower proof code to maintain correctness? In reality, it’s usually OK to assume termination. It’s pretty easy to see that a proof like plusZeroNeutral is total. It’s worth bearing in mind, though, that until Haskell gets a totality checker (likely never, apparently) these proofs aren’t “proper”.

Generating Singletons

One extra thing: while you’re proving things in one area of your code, you might not have the relevant singleton handy. To generate them on-demand, you’ll need a typeclass:

class KnownSing (x :: k) where
    sing :: The k x

instance KnownSing Z where
    sing = Zy

instance KnownSing n => KnownSing (S n) where
    sing = Sy sing

This kind of drives home the inefficiency of singleton-based proofs, and why it’s important to erase them aggressively.

Proofs Bundled with the Data Structure

One other way to solve these problems is to try find a data structure which runs the proof code anyway. As an example, consider a length-indexed list:

infixr 5 :-
data List n a where
    Nil :: List Z a
    (:-) :: a -> List n a -> List (S n) a

You might worry that concatenation of two lists requires some expensive proof code, like merge for the pairing heap. Maybe surprisingly, the default implementation just works:

infixr 5 ++
(++) :: List n a -> List m a -> List (Plus n m) a
(++) Nil ys = ys
(++) (x :- xs) ys = x :- xs ++ ys

Why? Well, if you look back to the definition of Plus, it’s almost exactly the same as the definition of (++). In effect, we’re using lists as the singleton for Peano here.

The question is, then: is there a heap which performs these proofs automatically for functions like merge? As far as I can tell: almost. First though:

Small Digression: Manipulating and Using the Length-Indexed List

The standard definition of ++ on normal lists can be cleaned up a little with foldr

(++) :: [a] -> [a] -> [a]
(++) = flip (foldr (:))

Can we get a similar definition for our length-indexed lists? Turns out we can, but the type of foldr needs to be a little different:

foldrList :: (forall x. a -> b x -> b (S x)) 
          -> b m -> List n a -> b (n + m)
foldrList f b Nil = b
foldrList f b (x :- xs) = f x (foldrList f b xs)

newtype Flip (f :: t -> u -> Type) (a :: u) (b :: t) 
    = Flip { unFlip :: f b a }

foldrList1 :: (forall x. a -> b x c -> b (S x) c) 
           -> b m c -> List n a -> b (n + m) c
foldrList1 f b 
    = unFlip . foldrList (\e -> Flip . f e . unFlip) (Flip b)

infixr 5 ++
(++) :: List n a -> List m a -> List (n + m) a
(++) = flip (foldrList1 (:-))

So what’s the point of this more complicated version? Well, if this were normal Haskell, we might get some foldr-fusion or something (in reality we would probably use augment if that were the purpose).

With this type-level business, though, there’s a similar application: loop unrolling. Consider the natural-number type again. We can write a typeclass which will perform induction over them:

class KnownPeano (n :: Peano)  where
    unrollRepeat :: Proxy n -> (a -> a) -> a -> a

instance KnownPeano Z where
    unrollRepeat _ = const id
    {-# INLINE unrollRepeat #-}

instance KnownPeano n =>
         KnownPeano (S n) where
    unrollRepeat (_ :: Proxy (S n)) f x =
        f (unrollRepeat (Proxy :: Proxy n) f x)
    {-# INLINE unrollRepeat #-}

Because the recursion here calls a different unrollRepeat function in the “recursive” call, we get around the usual hurdle of not being able to inline recursive calls. That means that the whole loop will be unrolled, at compile-time. We can do the same for foldr:

class HasFoldr (n :: Peano) where
    unrollFoldr 
        :: (forall x. a -> b x -> b (S x)) 
        -> b m 
        -> List n a 
        -> b (n + m)
  
instance HasFoldr Z where
    unrollFoldr _ b _ = b
    {-# INLINE unrollFoldr #-}

instance HasFoldr n => HasFoldr (S n) where
    unrollFoldr f b (x :- xs) = f x (unrollFoldr f b xs)
    {-# INLINE unrollFoldr #-}

I can’t think of many uses for this technique, but one that comes to mind is an n-ary uncurry (like Lisp’s apply):

infixr 5 :-
data List (xs :: [*]) where
        Nil :: List '[]
        (:-) :: a -> List xs -> List (a ': xs)

class KnownList (xs :: [*])  where
    foldrT
        :: (forall y ys. y -> result ys -> result (y ': ys))
        -> result '[]
        -> List xs
        -> result xs

instance KnownList ('[] :: [*]) where
    foldrT _ = const
    {-# INLINE foldrT #-}

instance KnownList xs =>
         KnownList (x ': xs) where
    foldrT f b (x :- xs) = f x (foldrT f b xs)
    {-# INLINE foldrT #-}

type family Func (xs :: [*]) (y :: *) where
        Func '[] y = y
        Func (x ': xs) y = x -> Func xs y

newtype FunType y xs = FunType
    { runFun :: Func xs y -> y
    }

uncurry
    :: KnownList xs
    => Func xs y -> List xs -> y
uncurry f l =
    runFun
        (foldrT
             (c (\x g h -> g (h x)))
             (FunType id)
             l)
        f
  where
    c :: (a -> ((Func xs y -> y) -> (Func zs z -> z)))
      -> (a -> (FunType y xs -> FunType z zs))
    c = coerce
    {-# INLINE c #-}
{-# INLINE uncurry #-}

I think that you can be guaranteed the above is inlined at compile-time, making it essentially equivalent to a handwritten uncurry.

Binomial Heaps

Anyway, back to the size-indexed heaps. The reason that (++) worked so easily on lists is that a list can be thought of as the data-structure equivalent to Peano numbers. Another numeric-system-based data structure is the binomial heap, which is based on binary numbering (I’m going mainly off of the description from Hinze 1999).

So, to work with binary numbers, let’s get some preliminaries on the type-level out of the way:

data instance The Bool x where
    Falsy :: The Bool False
    Truey :: The Bool True

data instance The [k] xs where
    Nily :: The [k] '[]
    Cony :: The k x -> The [k] xs -> The [k] (x : xs)

instance KnownSing True where
    sing = Truey

instance KnownSing False where
    sing = Falsy

instance KnownSing '[] where
    sing = Nily

instance (KnownSing xs, KnownSing x) =>
         KnownSing (x : xs) where
    sing = Cony sing sing

We’ll represent a binary number as a list of Booleans:

type family Sum (x :: Bool) (y :: Bool) (cin :: Bool) :: Bool where
        Sum False False False = False
        Sum False False True  = True
        Sum False True  False = True
        Sum False True  True  = False
        Sum True  False False = True
        Sum True  False True  = False
        Sum True  True  False = False
        Sum True  True  True  = True

type family Carry (x :: Bool) (y :: Bool) (cin :: Bool)
     (xs :: [Bool]) (ys :: [Bool]) :: [Bool] where
        Carry False False False xs ys = Add False xs ys
        Carry False False True  xs ys = Add False xs ys
        Carry False True  False xs ys = Add False xs ys
        Carry False True  True  xs ys = Add True  xs ys
        Carry True  False False xs ys = Add False xs ys
        Carry True  False True  xs ys = Add True  xs ys
        Carry True  True  False xs ys = Add True  xs ys
        Carry True  True  True  xs ys = Add True  xs ys

type family Add (cin :: Bool) (xs :: [Bool]) (ys :: [Bool]) ::
     [Bool] where
        Add c (x : xs) (y : ys) = Sum x y c : Carry x y c xs ys
        Add False '[] ys = ys
        Add False xs '[] = xs
        Add True  '[] ys = CarryOne ys
        Add True  xs '[] = CarryOne xs

type family CarryOne (xs :: [Bool]) :: [Bool] where
        CarryOne '[] = True : '[]
        CarryOne (False : xs) = True : xs
        CarryOne (True  : xs) = False : CarryOne xs

The odd definition of Carry is to avoid UndecidableInstances: if we had written, instead:

type family Carry (x :: Bool) (y :: Bool) (cin :: Bool) :: Bool where
        Carry False False False = False
        Carry False False True  = False
        Carry False True  False = False
        Carry False True  True  = True
        Carry True  False False = False
        Carry True  False True  = True
        Carry True  True  False = True
        Carry True  True  True  = True

type family Add (cin :: Bool) (xs :: [Bool]) (ys :: [Bool]) ::
     [Bool] where
        Add c (x : xs) (y : ys) = Sum x y c : Add (Carry x y c) xs ys
        Add False '[] ys = ys
        Add False xs '[] = xs
        Add True  '[] ys = CarryOne ys
        Add True  xs '[] = CarryOne xs

We would have been warned about nested type-family application.

Now we can base the merge function very closely on these type families. First, though, we’ll have to implement the heap.

Almost-Verified Data Structures

There are different potential properties you can verify in a data structure. In the sort-traversable post, the property of interest was that the number of elements in the structure would stay the same after adding and removing some number $n$ of elements. For this post, we’ll also verify structural invariants. I won’t, however, verify the heap property. Maybe in a later post.

When indexing a data structure by its size, you encode an awful lot of information into the type signature: the type becomes very specific to the structure in question. It is possible, though, to encode a fair few structural invariants without getting so specific. Here’s a signature for “perfect leaf tree”:

data BalTree a = Leaf a | Node (BalTree (a,a))

With that signature, it’s impossible to create a tree with more elements in its left branch than its right; the size of the tree, however, remains unspecified. You can use a similar trick to implement matrices which must be square (from Okasaki 1999): the usual trick (type Matrix n a = List n (List n a)) is too specific, providing size information at compile-time. If you’re interested in this approach, there are several more examples in Hinze (2001).

It is possible to go from the size-indexed version back to the non-indexed version, with an existential (RankNTypes for this example):

data ErasedSize f a = forall (n :: Peano). ErasedSize
    { runErasedSize :: f n a
    }

This will let you prove invariants in your implementation using an index, while keeping the user-facing type signature general and non-indexed.

A Fully-Structurally-Verified Binomial Heap

Wasserman (2010), was able to encode all of the structural invariants of the binomial heap without indexing by its size (well, all invariants except truncation, which turned out to be important a little later). I’ll be using a similar approach, except I’ll leverage some of the newer bells and whistles in GHC. Where Wasserman’s version used types like this for the numbering:

data Zero a = Zero
data Succ rk a = BinomTree rk a :< rk a
data BinomTree rk a = BinomTree a (rk a)

We can reuse the type-level Peano numbers with a GADT:

infixr 5 :-
data Binomial xs rk a where
       Nil :: Binomial '[] n a
       Skip :: Binomial xs (S rk) a -> Binomial (False : xs) rk a
       (:-) :: Tree rk a 
            -> Binomial xs (S rk) a 
            -> Binomial (True : xs) rk a

data Tree rk a = Root a (Node rk a)

infixr 5 :<
data Node n a where
       NilN :: Node Z a
       (:<) :: Tree n a -> Node n a -> Node (S n) a

The definition of Tree here ensures that any tree of rank $n$ has $2^n$ elements. The binomial heap, then, is a list of trees, in ascending order of size, with a True at every point in its type-level list where a tree is present, and a False wherever one is absent. In other words, the type-level list is a binary encoding of the number of elements it contains.

And here are the merge functions:

mergeTree :: Ord a => Tree rk a -> Tree rk a -> Tree (S rk) a
mergeTree xr@(Root x xs) yr@(Root y ys)
  | x <= y    = Root x (yr :< xs)
  | otherwise = Root y (xr :< ys)

merge 
    :: Ord a 
    => Binomial xs z a 
    -> Binomial ys z a 
    -> Binomial (Add False xs ys) z a
merge Nil ys              = ys
merge xs Nil              = xs
merge (Skip xs) (Skip ys) = Skip (merge xs ys)
merge (Skip xs) (y :- ys) = y :- merge xs ys
merge (x :- xs) (Skip ys) = x :- merge xs ys
merge (x :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys)

mergeCarry
    :: Ord a
    => Tree rk a 
    -> Binomial xs rk a 
    -> Binomial ys rk a 
    -> Binomial (Add True xs ys) rk a
mergeCarry t Nil ys              = carryOne t ys
mergeCarry t xs Nil              = carryOne t xs
mergeCarry t (Skip xs) (Skip ys) = t :- merge xs ys
mergeCarry t (Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys)
mergeCarry t (x :- xs) (Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys)
mergeCarry t (x :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys

carryOne 
    :: Ord a 
    => Tree rk a -> Binomial xs rk a -> Binomial (CarryOne xs) rk a
carryOne t Nil       = t :- Nil
carryOne t (Skip xs) = t :- xs
carryOne t (x :- xs) = Skip (carryOne (mergeTree t x) xs)

You’ll notice that no proofs are needed: that’s because the merge function itself is the same as the type family, like the way ++ for lists was the same as the Plus type family.

Of course, this structure is only verified insofar as you believe the type families. It does provide a degree of double-entry, though: any mistake in the type family will have to be mirrored in the merge function to type-check. On top of that, we can write some proofs of properties we might expect:

addCommutes
  :: The [Bool] xs
  -> The [Bool] ys
  -> Add False xs ys :~: Add False ys xs
addCommutes Nily _ = Refl
addCommutes _ Nily = Refl
addCommutes (Cony Falsy xs) (Cony Falsy ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Truey xs) (Cony Falsy ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Falsy xs) (Cony Truey ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Truey xs) (Cony Truey ys) =
    gcastWith (addCommutesCarry xs ys) Refl

addCommutesCarry
  :: The [Bool] xs
  -> The [Bool] ys
  -> Add True xs ys :~: Add True ys xs
addCommutesCarry Nily _ = Refl
addCommutesCarry _ Nily = Refl
addCommutesCarry (Cony Falsy xs) (Cony Falsy ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutesCarry (Cony Truey xs) (Cony Falsy ys) =
    gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry (Cony Falsy xs) (Cony Truey ys) =
    gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry (Cony Truey xs) (Cony Truey ys) =
    gcastWith (addCommutesCarry xs ys) Refl

Unfortunately, though, this method does require proofs (ugly proofs) for the delete-min operation. One of the issues is truncation: since the binary digits are stored least-significant-bit first, the same number can be represented with any number of trailing zeroes. This kept causing problems for me when it came to subtraction, and adding the requirement of no trailing zeroes (truncation) to the constructors for the heap was a pain, requiring extra proofs on merge to show that it preserves truncation.

Doubly-Dependent Types

Since some of these properties are much easier to verify on the type-level Peano numbers, one approach might be to convert back and forth between Peano numbers and binary, and use the proofs on Peano numbers instead.

type family BintoPeano (xs :: [Bool]) :: Peano where
        BintoPeano '[] = Z
        BintoPeano (False : xs) = BintoPeano xs + BintoPeano xs
        BintoPeano (True : xs) = S (BintoPeano xs + BintoPeano xs)

First problem: this requires UndecidableInstances. I’d really rather not have that turned on, to be honest. In Idris (and Agda), you can prove decidability using a number of different methods, but this isn’t available in Haskell yet.

Regardless, we can push on.

To go in the other direction, we’ll need to calculate the parity of natural numbers. Taken from the Idris tutorial:

data Parity (n :: Peano) where
    Even :: The Peano n -> Parity (n + n)
    Odd  :: The Peano n -> Parity (S (n + n))

parity :: The Peano n -> Parity n
parity Zy = Even Zy
parity (Sy Zy) = Odd Zy
parity (Sy (Sy n)) = case parity n of
  Even m -> gcastWith (plusSuccDistrib m m) (Even (Sy m))
  Odd  m -> gcastWith (plusSuccDistrib m m) (Odd (Sy m))

plusSuccDistrib :: The Peano n -> proxy m -> n + S m :~: S (n + m)
plusSuccDistrib Zy _ = Refl
plusSuccDistrib (Sy n) p = gcastWith (plusSuccDistrib n p) Refl

We need this function on the type-level, though, not the value-level: here, again, we run into trouble. What does gcastWith look like on the type-level? As far as I can tell, it doesn’t exist (yet. Although I haven’t looked deeply into the singletons library yet).

This idea of doing dependently-typed stuff on the type-level started to be possible with TypeInType. For instance, we could have defined our binary type as:

data Binary :: Peano -> Type where
    O :: Binary n -> Binary (n + n)
    I :: Binary n -> Binary (S (n + n))
    E :: Binary Z

And then the binomial heap as:

data Binomial (xs :: Binary n) (rk :: Peano) (a :: Type) where
       Nil :: Binomial E n a
       Skip :: Binomial xs (S rk) a -> Binomial (O xs) rk a
       (:-) :: Tree rk a 
            -> Binomial xs (S rk) a 
            -> Binomial (I xs) rk a

What we’re doing here is indexing a type by an indexed type. This wasn’t possible in Haskell a few years ago. It still doesn’t get us a nice definition of subtraction, though.

Using a Typechecker Plugin

It’s pretty clear that this approach gets tedious almost immediately. What’s more, if we want the proofs to be erased, we introduce potential for errors.

The solution? Beef up GHC’s typechecker with a plugin. I first came across this approach in Kenneth Foner’s talk at Compose. He used a plugin that called out to the Z3 theorem prover (from Diatchki 2015); I’ll use a simpler plugin which just normalizes type-literals.

From what I’ve used of these plugins so far, they seem to work really well. They’re very unobtrusive, only requiring a pragma at the top of your file:

{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

The plugin is only called when GHC can’t unify two types: this means you don’t get odd-looking error messages in unrelated code (in fact, the error messages I’ve seen so far have been excellent—a real improvement on the standard error messages for type-level arithmetic). Another benefit is that we get to use type-level literals (Nat imported from GHC.TypeLits), rather then the noisy-looking type-level Peano numbers.

data Tree n a = Root a (Node n a)

data Node :: Nat -> Type -> Type where
        NilN :: Node 0 a
        (:<) :: {-# UNPACK #-} !(Tree n a)
             -> Node n a
             -> Node (1 + n) a

mergeTree :: Ord a => Tree n a -> Tree n a -> Tree (1 + n) a
mergeTree xr@(Root x xs) yr@(Root y ys)
  | x <= y    = Root x (yr :< xs)
  | otherwise = Root y (xr :< ys)

infixr 5 :-
data Binomial :: Nat -> Nat -> Type -> Type where
        Nil  :: Binomial n 0 a
        (:-) :: {-# UNPACK #-} !(Tree z a)
             -> Binomial (1 + z) xs a
             -> Binomial z (1 + xs + xs) a
        Skip :: Binomial (1 + z) (1 + xs) a
             -> Binomial z (2 + xs + xs) a

This definition also ensures that the binomial heap has no trailing zeroes in its binary representation: the Skip constructor can only be applied to a heap bigger than zero.

Since we’re going to be looking at several different heaps, we’ll need a class to represent all of them:

class IndexedQueue h a where

    {-# MINIMAL insert, empty, minViewMay, minView #-}

    empty
        :: h 0 a

    minView
        :: h (1 + n) a -> (a, h n a)

    singleton
        :: a -> h 1 a
    singleton = flip insert empty

    insert
        :: a -> h n a -> h (1 + n) a

    minViewMay
       :: h n a
       -> (n ~ 0 => b)
       -> (forall m. (1 + m) ~ n => a -> h m a -> b)
       -> b

class IndexedQueue h a =>
      MeldableIndexedQueue h a where
    merge
        :: h n a -> h m a -> h (n + m) a

You’ll need MultiParamTypeClasses for this one.

mergeB
    :: Ord a
    => Binomial z xs a -> Binomial z ys a -> Binomial z (xs + ys) a
mergeB Nil ys              = ys
mergeB xs Nil              = xs
mergeB (Skip xs) (Skip ys) = Skip (mergeB xs ys)
mergeB (Skip xs) (y :- ys) = y :- mergeB xs ys
mergeB (x :- xs) (Skip ys) = x :- mergeB xs ys
mergeB (x :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys)

mergeCarry
    :: Ord a
    => Tree z a
    -> Binomial z xs a
    -> Binomial z ys a
    -> Binomial z (1 + xs + ys) a
mergeCarry !t Nil ys              = carryOne t ys
mergeCarry !t xs Nil              = carryOne t xs
mergeCarry !t (Skip xs) (Skip ys) = t :- mergeB xs ys
mergeCarry !t (Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys)
mergeCarry !t (x :- xs) (Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys)
mergeCarry !t (x :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys

carryOne :: Ord a => Tree z a -> Binomial z xs a -> Binomial z (1 + xs) a
carryOne !t Nil       = t :- Nil
carryOne !t (Skip xs) = t :- xs
carryOne !t (x :- xs) = Skip (carryOne (mergeTree t x) xs)

instance Ord a => MeldableIndexedQueue (Binomial 0) a where
    merge = mergeB
    {-# INLINE merge #-}

instance Ord a => IndexedQueue (Binomial 0) a where
    empty = Nil
    singleton x = Root x NilN :- Nil
    insert = merge . singleton

(BangPatterns for this example)

On top of that, it’s very easy to define delete-min:

    minView xs = case minViewZip xs of
      Zipper x _ ys -> (x, ys)
    minViewMay q b f = case q of
      Nil -> b
      _ :- _ -> uncurry f (minView q)
      Skip _ -> uncurry f (minView q)

data Zipper a n rk = Zipper !a (Node rk a) (Binomial rk n a)

skip :: Binomial (1 + z) xs a -> Binomial z (xs + xs) a
skip x = case x of
  Nil    -> Nil
  Skip _ -> Skip x
  _ :- _ -> Skip x

data MinViewZipper a n rk where
    Infty :: MinViewZipper a 0 rk
    Min :: {-# UNPACK #-} !(Zipper a n rk) -> MinViewZipper a (n+1) rk

slideLeft :: Zipper a n (1 + rk) -> Zipper a (1 + n + n) rk
slideLeft (Zipper m (t :< ts) hs)
  = Zipper m ts (t :- hs)

pushLeft 
  :: Ord a 
  => Tree rk a 
  -> Zipper a n (1 + rk) 
  -> Zipper a (2 + n + n) rk
pushLeft c (Zipper m (t :< ts) hs)
  = Zipper m ts (Skip (carryOne (mergeTree c t) hs))

minViewZip :: Ord a => Binomial rk (1 + n) a -> Zipper a n rk
minViewZip (Skip xs) = slideLeft (minViewZip xs)
minViewZip (t@(Root x ts) :- f) = case minViewZipMay f of
  Min ex@(Zipper minKey _ _) | minKey < x -> pushLeft t ex
  _                          -> Zipper x ts (skip f)

minViewZipMay :: Ord a => Binomial rk n a -> MinViewZipper a n rk
minViewZipMay (Skip xs) = Min (slideLeft (minViewZip xs))
minViewZipMay Nil = Infty
minViewZipMay (t@(Root x ts) :- f) = Min $ case minViewZipMay f of
  Min ex@(Zipper minKey _ _) | minKey < x -> pushLeft t ex
  _                          -> Zipper x ts (skip f)

Similarly, compare the version of the pairing heap with the plugin:

data Heap n a where
  E :: Heap 0 a
  T :: a -> HVec n a -> Heap (1 + n) a

data HVec n a where
  HNil :: HVec 0 a
  HCons :: Heap m a -> HVec n a -> HVec (m + n) a

insert :: Ord a => a -> Heap n a -> Heap (1 + n) a
insert x xs = merge (T x HNil) xs

merge :: Ord a => Heap m a -> Heap n a -> Heap (m + n) a
merge E ys = ys
merge xs E = xs
merge h1@(T x xs) h2@(T y ys)
  | x <= y = T x (HCons h2 xs)
  | otherwise = T y (HCons h1 ys)

minView :: Ord a => Heap (1 + n) a -> (a, Heap n a)
minView (T x hs) = (x, mergePairs hs)

mergePairs :: Ord a => HVec n a -> Heap n a
mergePairs HNil = E
mergePairs (HCons h HNil) = h
mergePairs (HCons h1 (HCons h2 hs)) =
    merge (merge h1 h2) (mergePairs hs)

To the version without the plugin:

data Heap n a where
  E :: Heap Z a
  T :: a -> HVec n a -> Heap (S n) a

data HVec n a where
  HNil :: HVec Z a
  HCons :: Heap m a -> HVec n a -> HVec (m + n) a

class Sized h where
  size :: h n a -> The Peano n

instance Sized Heap where
  size E = Zy
  size (T _ xs) = Sy (size xs)

plus :: The Peano n -> The Peano m -> The Peano (n + m)
plus Zy m = m
plus (Sy n) m = Sy (plus n m)

instance Sized HVec where
  size HNil = Zy
  size (HCons h hs) = size h `plus` size hs

insert :: Ord a => a -> Heap n a -> Heap (S n) a
insert x xs = merge (T x HNil) xs

merge :: Ord a => Heap m a -> Heap n a -> Heap (m + n) a
merge E ys = ys
merge xs E = case plusZero (size xs) of Refl -> xs
merge h1@(T x xs) h2@(T y ys)
  | x <= y = case plusCommutative (size h2) (size xs) of
                    Refl -> T x (HCons h2 xs)
  | otherwise = case plusSuccDistrib (size xs) (size ys) of
                    Refl -> T y (HCons h1 ys)

minView :: Ord a => Heap (S n) a -> (a, Heap n a)
minView (T x hs) = (x, mergePairs hs)

mergePairs :: Ord a => HVec n a -> Heap n a
mergePairs HNil = E
mergePairs (HCons h HNil) = case plusZero (size h) of Refl -> h
mergePairs (HCons h1 (HCons h2 hs)) =
  case plusAssoc (size h1) (size h2) (size hs) of
    Refl -> merge (merge h1 h2) (mergePairs hs)

Leftist Heaps

The typechecker plugin makes it relatively easy to implement several other heaps: skew, Braun, etc. You’ll need one extra trick to implement a leftist heap, though. Let’s take a look at the unverified version:

data Leftist a
    = Leaf
    | Node {-# UNPACK #-} !Int
           a
           (Leftist a)
           (Leftist a)

rank :: Leftist s -> Int
rank Leaf          = 0
rank (Node r _ _ _) = r
{-# INLINE rank #-}

mergeL :: Ord a => Leftist a -> Leftist a -> Leftist a
mergeL Leaf h2 = h2
mergeL h1 Leaf = h1
mergeL h1@(Node w1 p1 l1 r1) h2@(Node w2 p2 l2 r2)
  | p1 < p2 =
      if ll <= lr
          then LNode (w1 + w2) p1 l1 (mergeL r1 h2)
          else LNode (w1 + w2) p1 (mergeL r1 h2) l1
  | otherwise =
      if rl <= rr
          then LNode (w1 + w2) p2 l2 (mergeL r2 h1)
          else LNode (w1 + w2) p2 (mergeL r2 h1) l2
  where
    ll = rank r1 + w2
    lr = rank l1
    rl = rank r2 + w1
    rr = rank l2

In a weight-biased leftist heap, the left branch in any tree must have at least as many elements as the right branch. Ideally, we would encode that in the representation of size-indexed leftist heap:

data Leftist n a where
        Leaf :: Leftist 0 a
        Node :: !(The Nat (n + m + 1))
             -> a
             -> Leftist n a
             -> Leftist m a
             -> !(m <= n)
             -> Leftist (n + m + 1) a

rank :: Leftist n s -> The Nat n
rank Leaf             = sing
rank (Node r _ _ _ _) = r
{-# INLINE rank #-}

Two problems, though: first of all, we need to be able to compare the sizes of two heaps, in the merge function. If we were using the type-level Peano numbers, this would be too slow. More importantly, though, we need the comparison to provide a proof of the ordering, so that we can use it in the resulting Node constructor.

Integer-Backed Type-Level Numbers

In Agda, the Peano type is actually backed by Haskell’s Integer at runtime. This allows compile-time proofs to be written about values which are calculated efficiently. We can mimic the same thing in Haskell with a newtype wrapper around Integer with a phantom Peano parameter, if we promise to never put an integer in which has a different value to its phantom value. We can make this promise a little more trustworthy if we don’t export the newtype constructor.

newtype instance The Nat n where
        NatSing :: Integer -> The Nat n

instance KnownNat n => KnownSing n where
    sing = NatSing $ Prelude.fromInteger $ natVal (Proxy :: Proxy n)

FlexibleInstances is needed for the instance. We can also encode all the necessary arithmetic:

infixl 6 +.
(+.) :: The Nat n -> The Nat m -> The Nat (n + m)
(+.) =
    (coerce :: (Integer -> Integer -> Integer) 
            -> The Nat n -> The Nat m -> The Nat (n + m))
        (+)
{-# INLINE (+.) #-}

Finally, the compare function (ScopedTypeVariables for this):

infix 4 <=.
(<=.) :: The Nat n -> The Nat m -> The Bool (n <=? m)
(<=.) (NatSing x :: The Nat n) (NatSing y :: The Nat m)
  | x <= y = 
      case (unsafeCoerce (Refl :: True :~: True) :: (n <=? m) :~: True) of
        Refl -> Truey
  | otherwise = 
      case (unsafeCoerce (Refl :: True :~: True) :: (n <=? m) :~: False) of
        Refl -> Falsy
{-# INLINE (<=.) #-}

totalOrder ::  p n -> q m -> (n <=? m) :~: False -> (m <=? n) :~: True
totalOrder (_ :: p n) (_ :: q m) Refl = 
    unsafeCoerce Refl :: (m <=? n) :~: True

type x <= y = (x <=? y) :~: True

It’s worth mentioning that all of these functions are somewhat axiomatic: there’s no checking of these definitions going on, and any later proofs are only correct in terms of these functions.

If we want our merge function to really look like the non-verified version, though, we’ll have to mess around with the syntax a little.

A Dependent if-then-else

When matching on a singleton, within the case-match, proof of the singleton’s type is provided. For instance:

type family IfThenElse (c :: Bool) (true :: k) (false :: k) :: k
     where
        IfThenElse True true false = true
        IfThenElse False true false = false

intOrString :: The Bool cond -> IfThenElse cond Int String
intOrString Truey = 1
intOrString Falsy = "abc"

In Haskell, since we can overload the if-then-else construct (with RebindableSyntax), we can provide the same syntax, while hiding the dependent nature:

ifThenElse :: The Bool c -> (c :~: True -> a) -> (c :~: False -> a) -> a
ifThenElse Truey t _ = t Refl
ifThenElse Falsy _ f = f Refl

Verified Merge

Finally, then, we can write the implementation for merge, which looks almost exactly the same as the non-verified merge:

instance Ord a => IndexedQueue Leftist a where

    minView (Node _ x l r _) = (x, merge l r)
    {-# INLINE minView #-}


    singleton x = Node sing x Leaf Leaf Refl
    {-# INLINE singleton #-}

    empty = Leaf
    {-# INLINE empty #-}

    insert = merge . singleton
    {-# INLINE insert #-}

    minViewMay Leaf b _             = b
    minViewMay (Node _ x l r _) _ f = f x (merge l r)

instance Ord a =>
         MeldableIndexedQueue Leftist a where
    merge Leaf h2 = h2
    merge h1 Leaf = h1
    merge h1@(Node w1 p1 l1 r1 _) h2@(Node w2 p2 l2 r2 _)
      | p1 < p2 =
          if ll <=. lr
             then Node (w1 +. w2) p1 l1 (merge r1 h2)
             else Node (w1 +. w2) p1 (merge r1 h2) l1 . totalOrder ll lr
      | otherwise =
          if rl <=. rr
              then Node (w1 +. w2) p2 l2 (merge r2 h1)
              else Node (w1 +. w2) p2 (merge r2 h1) l2 . totalOrder rl rr
      where
        ll = rank r1 +. w2
        lr = rank l1
        rl = rank r2 +. w1
        rr = rank l2
    {-# INLINE merge #-}

What’s cool about this implementation is that it has the same performance as the non-verified version (if Integer is swapped out for Int, that is), and it looks pretty much the same. This is very close to static verification for free.

Generalizing Sort to Parts

The Sort type used in the original blog post can be generalized to any indexed container.

data Parts f g a b r where
    Parts :: (forall n. g (m + n) b -> (g n b, r))
         -> !(f m a)
         -> Parts f g a b r

instance Functor (Parts f g a b) where
  fmap f (Parts g h) =
    Parts (\h' -> case g h' of (remn, r) -> (remn, f r)) h
  {-# INLINE fmap #-}

instance (IndexedQueue f x, MeldableIndexedQueue f x) =>
          Applicative (Parts f g x y) where
    pure x = Parts (\h -> (h, x)) empty
    {-# INLINE pure #-}

    (Parts f (xs :: f m x) :: Parts f g x y (a -> b)) <*> 
      Parts g (ys :: f n x) =
        Parts h (merge xs ys)
        where
          h :: forall o . g ((m + n) + o) y -> (g o y, b)
          h v = case f v of { (v', a) ->
                    case g v' of { (v'', b) ->
                      (v'', a b)}}
    {-# INLINABLE (<*>) #-}

This version doesn’t insist that you order the elements of the heap in any particular way: we could use indexed difference lists to reverse a container, or indexed lists to calculate permutations of a container, for instance.

Other Uses For Size-Indexed Heaps

I’d be very interested to see any other uses of these indexed heaps, if anyone has any ideas. Potentially the could be used in any place where there is a need for some heap which is known to be of a certain size (a true prime sieve, for instance).

The Library

I’ve explored all of these ideas here. It has implementations of all the heaps I mentioned, as well as the index-erasing type, and a size-indexed list, for reversing traversables. In the future, I might add things like a Fibonacci heap, or the optimal Brodal/Okasaki heap (Brodal and Okasaki 1996).

Brodal, Gerth Stølting, and Chris Okasaki. 1996. “Optimal Purely Functional Priority Queues.” Journal of Functional Programming 6 (6) (November): 839–857. doi:10.1017/S095679680000201X. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.973.

Diatchki, Iavor S. 2015. “Improving Haskell Types with SMT.” In Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell, 1–10. Haskell ’15. New York, NY, USA: ACM. doi:10.1145/2804302.2804307. http://yav.github.io/publications/improving-smt-types.pdf.

Eisenberg, Richard A., and Stephanie Weirich. 2012. “Dependently Typed Programming with Singletons.” In Proceedings of the 2012 Haskell Symposium, 117–130. Haskell ’12. New York, NY, USA: ACM. doi:10.1145/2364506.2364522. http://cs.brynmawr.edu/~rae/papers/2012/singletons/paper.pdf.

Fredman, Michael L., Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan. 1986. “The pairing heap: A new form of self-adjusting heap.” Algorithmica 1 (1-4) (January): 111–129. doi:10.1007/BF01840439. http://www.cs.princeton.edu/courses/archive/fall09/cos521/Handouts/pairingheaps.pdf.

Hinze, Ralf. 1999. “Functional Pearls: Explaining Binomial Heaps.” Journal of Functional Programming 9 (1) (January): 93–104. doi:10.1017/S0956796899003317. http://www.cs.ox.ac.uk/ralf.hinze/publications/#J1.

———. 2001. “Manufacturing datatypes.” Journal of Functional Programming 11 (5) (September): 493–524. doi:10.1017/S095679680100404X. http://www.cs.ox.ac.uk/ralf.hinze/publications/#J6.

Okasaki, Chris. 1999. “From Fast Exponentiation to Square Matrices: An Adventure in Types.” In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’99), Paris, France, September 27-29, 1999, 34:28. ACM. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.357&rep=rep1&type=pdf.

Wasserman, Louis. 2010. “Playing with Priority Queues.” The Monad.Reader 16 (16) (May): 37. https://themonadreader.files.wordpress.com/2010/05/issue16.pdf.