Keeping Formal Verification in Bounds
One of the favorite pastimes of both Haskell and Agda programmers alike is verifying data structures. Among my favorite examples are Red-Black trees (Might 2015; Weirich 2014, verified for balance), perfect binary trees (Hinze 1999), square matrices (Okasaki 1999a), search trees (McBride 2014, verified for balance and order), and binomial heaps (Hinze 1998, verified for structure).
There are many ways to verify data structures. One technique which
has had recent massive success is to convert Haskell code to Coq, and
then verify the Coq translation: this was the route taken by Breitner et al.
(2018) to verify Set
and IntSet
in
containers (a mammoth achievement, in my opinion).
This approach has some obvious advantages: you separate implementation from testing (which is usually a good idea), and your verification language can be different from your implementation language, with each tailored towards its particular domain.
LiquidHaskell (Bakst et al. 2018) (and other tools like it) adds an extra type system to Haskell tailor-made for verification. The added type system (refinement types) is more automated (the typechecker uses Z3), more suited for “invariant”-like things (it supports subtyping), and has a bunch of domain-specific built-ins (reasoning about sets, equations, etc.). I’d encourage anyone who hasn’t used it to give it a try: especially if you’re experienced writing any kind of proof in a language like Agda or Idris, LiquidHaskell proofs are shockingly simple and easy.
What I’m going to focus on today, though, is writing correct-by-construction data structures, using Haskell and Agda’s own type systems. In particular, I’m going to look at how to write fast verification. In the other two approaches, we don’t really care about the “speed” of the proofs: sure, it’s nice to speed up compilation and so on, but we don’t have to worry about our implementation suffering at runtime because of some complex proof. When writing correct-by-construction code, though, our task is doubly hard: we now have to worry about the time complexity of both the implementation and the proofs.
In this post, I’m going to demonstrate some techniques to write proofs that stay within the complexity bounds of the algorithms they’re verifying (without cheating!). Along the way I’m going to verify some data structures I haven’t seen verified before (a skew-binary random-access list).
Technique 1: Start With an Unverified Implementation, then Index
To demonstrate the first two techniques, we’re going to write a type for modular arithmetic. For a more tactile metaphor, think of the flip clock:
Each digit can be incremented times, where is whatever base you’re using (12 for our flip-clock above). Once you hit the limit, it flips the next digit along. We’ll start with just one digit, and then just string them together to get our full type. That in mind, our “digit” type has two requirements:
- It should be incrementable.
- Once it hits its limit, it should flip back to zero, and let us know that a flip was performed.
Anyone who’s used a little Agda or Idris will be familiar with the
Fin
type:
data Fin : ℕ → Set where
: {n : ℕ} → Fin (suc n)
zero : {n : ℕ} → Fin n → Fin (suc n) suc
Fin n
is the standard way to encode “numbers smaller
than n
”. However, for digits they’re entirely unsuitable:
since the limit parameter changes on successor, the kind of increment we
want is
:
: ∀ {n} → Fin n → Maybe (Fin n)
try-suc (suc x) = Maybe.map suc (try-suc x)
try-suc {suc n} zero with n
try-suc ... | zero = nothing
... | suc _ = just (suc zero)
: ∀ {n} → Fin n → Fin n × Bool
suc-flip {suc n} x = maybe (_, false) (zero , true) (try-suc x)
suc-flip {zero} () suc-flip
If we keep going down this path with proofs in mind, we might next look at the various proofs in the Agda standard library (here, here, and here), and see if we can we can wrangle them into doing what we want.
For me, though, this wasn’t a fruitful approach. Instead, we’ll try and think of how we’d do this without proving anything, and then see if there’s any place in the resulting data structure we can hang some proof.
So, in an unproven way, let’s start with some numbers. Since we’re going to be incrementing, they’d better be unary:
data ℕ : Set where
: ℕ
zero : ℕ → ℕ suc
And then, for the “flippable” type, we’ll just store the limit alongside the value:
record Flipper : Set where
constructor _&_
field
: ℕ
val : ℕ lim
We’re not there yet: to check if we’ve gone over the limit, we’ll
still have to compare val
and lim
. Hopefully
you can guess the optimization we’ll make: instead of storing the limit,
we’ll store the space left:
record Flipper : Set where
constructor _&_
field
: ℕ
space : ℕ val
And we get our flip function:
: Flipper → Flipper × Bool
suc-flip (zero & n) = (suc n & zero ), true
suc-flip (suc m & n) = (m & suc n), false suc-flip
When there’s no space left, the digit must be maximal (9 in decimal, for instance), so it’ll be one less than the base. That lets us stick it in for the base, rather than recalculating. In the other case, we just take one from the space left, and add it to the value.
So, to “prove” this implementation, we might first reach for an
equality proof that val + space
is equal to your base.
Don’t! Both val
and space
are inductive
structures, which could be giving us information on every application of
suc
! Let’s set our sights on val
and see how
we can hang our proofs off of it.
We’re going to upgrade our Peano number with some information, which
means that our resulting type is going to look an awful lot like a Peano
number. In other words, two cases: zero
and
suc
.
data Val _ : Set where
: Val _
zero-case : Val _ → Val _ suc-case
For the suc-case
, remember we only want to be allowed to
increment it when the space left is more than zero. So let’s encode
it:
data Val _ : ℕ → Set where
: Val _
zero-case : ∀ {space} → Val _ (suc space) → Val _ space suc-case
And for the zero-case
, the space left is just the base.
So let’s stick the base into the type as well:
data Val (base : ℕ) : ℕ → Set where
: Val base base
zero-case : ∀ {space} → Val base (suc space) → Val base space suc-case
(We’ve changed around the way “base” works: it’s now one smaller. So
to encode base-10 you’d have Val 9 space
. You can get back
to the other encoding with a simple wrapper, this way just makes things
slightly easier from now on).
Finally, our flipper:
record Flipper (base : ℕ) : Set where
constructor _&_
field
: ℕ
space : Val base space
val
: ∀ {n} → Flipper n → Flipper n × Bool
suc-flip (zero & m) = (_ & zero-case) , true
suc-flip (suc n & m) = (n & suc-case m) , false suc-flip
Great! Everything works.
You may have noticed that the Val
type is actually a
proof for
in disguise:
data _≥_ (m : ℕ) : ℕ → Set where
: m ≥ m
m≥m : ∀ {n} → m ≥ suc n → m ≥ n m≥p
And the flipper itself is just an existential in disguise:
: ℕ → Set
Flipper = ∃ (n ≥_)
Flipper n
: ∀ {n} → Flipper n → Flipper n × Bool
suc-flip (zero , m) = (_ , m≥m ), true
suc-flip (suc n , m) = (n , m≥p m), false suc-flip
Hopefully this explanation will help you understand how to get from the specification to those 8 lines. This technique is going to come in especially handy later when we base data structures off of number systems.
Technique 2: Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.
For this next trick, we’ll add an extra operation to the flipper type above: conversion from a natural number. We want to be able to do it in time, and we won’t allow ourselves to change the original type definition. Here’s the type we’re aiming for:
: ∀ {m} (n : ℕ) → (m≥n : m ≥ n) → Flipper m fromNat
We pass in a proof that the natural number we’re converting from is indeed in range (it’s marked irrelevant so we don’t pay for it). Here’s a non-answer:
: ∀ {m} (n : ℕ) → {m≥n : m ≥ n} → Flipper m
fromNat {m≥n} = n , m≥n fromNat n
While this looks fine, it’s actually the inverse of what we want. We defined the inductive structure to be indicated by the inequality proof itself. Let’s make the desired output explicit:
: ∀ {n m} → n ≥ m → ℕ
toNat = zero
toNat m≥m (m≥p n≥m) = suc (toNat n≥m)
toNat
: ∀ {n} m
fromNat-≡ → .(n≥m : n ≥ m)
→ Σ[ n-m ∈ Flipper n ] toNat (proj₂ n-m) ≡ m
And finally we can try an implementation:
_ = (_ , m≥m) , refl
fromNat-≡ zero (suc m) n≥m = ??? (fromNat-≡ m (m≥p n≥m)) fromNat-≡
In the ???
there, we want some kind of successor
function. The problem is that we would also need to prove that we
can do a successor call. Except we don’t want to do that:
proving that there’s space left is an expensive operation, and one we
can avoid with another trick: first, we assume that there’s
space left.
= ( _ , m≥m) , refl
fromNat-≡ zero n≥m (suc n) n≥m with fromNat-≡ n (m≥p n≥m)
fromNat-≡ ... | (suc space , n-1), x≡m = (space , m≥p n-1), cong suc x≡m
... | (zero , n-1), refl = ???
But what about the second case? Well, we have to prove this impossible. What if it’s an extremely complex, expensive proof? It doesn’t matter! It will never be run! In contrast to proving the “happy path” correct, if we can confine all of the ugly complex cases to the unhappy paths, we can spend as long as we want proving them impossible without having to worry about runtime cost. Here’s the full function.
fromNat
implementation
: ∀ {n} m
fromNat-≡ → .(n≥m : n ≥ m)
→ Σ[ n-m ∈ Flipper n ] toNat (proj₂ n-m) ≡ m
= ( _ , m≥m) , refl
fromNat-≡ zero n≥m (suc n) n≥m with fromNat-≡ n (m≥p n≥m)
fromNat-≡ ... | (suc space , n-1), x≡m = (space , m≥p n-1), cong suc x≡m
... | (zero , n≥0), refl = Irrel.⊥-elim (contra _ zero n≥0 n≥m)
where
import Data.Nat.Properties as Prop
: ∀ n k {sk+n} → sk+n ≡ suc k ℕ.+ n → n ≥ sk+n → ⊥
n≱sk+n (m≥p n≥sk+n) = n≱sk+n n (suc k) (cong suc wit) n≥sk+n
n≱sk+n n k wit with Prop.+-cancelʳ-≡ 0 (suc k) wit
n≱sk+n n k wit m≥m ... | ()
: ∀ n m → (n≥m : n ≥ m) → n ≥ suc (m ℕ.+ toNat n≥m) → ⊥
contra = n≱sk+n n zero (cong suc (Prop.+-identityʳ n)) n≥st
contra n m m≥m n≥st (m≥p n≥m) n≥st =
contra n m
contra
n(suc m)
n≥m(subst (λ x → n ≥ suc x) (Prop.+-suc m (toNat n≥m)) n≥st)
: ∀ {n} m → .(n≥m : n ≥ m) → Flipper n
fromNat = proj₁ (fromNat-≡ m n≥m) fromNat m n≥m
Technique 3: Make Indices Correct-By-Construction
We’re going to switch into Haskell now, and in particular to functional arrays. These are data structures which aren’t real arrays, but they offer you the kind of interface you’d want from an array in a functional setting. You can’t get better than indexing, unfortunately (Ben-Amram and Galil 1992), but often it’s enough.
The first “functional array” we’re going to be looking at nested
binary random-access lists. It has
indexing, as you might expect, and amortized single-threaded
cons
.
It starts out like a binary random-access list (“random-access list” is another name for “functional array”). You can find a full explanation of the structure in your nearest copy of Purely Functional Data Structures (Okasaki 1999b), but briefly: the structure mimics a binary number, in that it’s a list of “bits”. At each set bit, it stores a tree with elements, where is the position in the list. In this way, every binary number has an analogous list of “bits” which contains, in total, elements.
The “nested” part refers to how we’re going to implement the trees. It works a little like this:
data Tree a = Leaf a | Node (Tree (a,a))
You might have to squint at that definition for a second to
understand it: instead of storing two trees at the Node
constructor (which is what you’d usually do), we store a tree with
double the elements. This has two advantages: all of the children have
the same number of elements (this tree, for instance, is always some
power of 2), and it also cuts down on memory use.
For the binary random-access list, we’ll use the nested encoding of trees to encode the contents of each bit. There’s an implementation of this very thing on Hackage (Komuves and Divianszky 2016), and Okasaki himself wrote something very similar to it (1999a), but we’re going to go a little further than both of those by indexing the type by its size. Here it is:
data Bit = O | I
data Seq ns a where
Nil :: Seq '[] a
Even :: Seq xs (a,a) -> Seq (O : xs) a
Odd :: a -> Seq xs (a,a) -> Seq (I : xs) a
The operations we’re interested will be cons
and
uncons
: for the indices, they correspond to incrementing
and decrementing the numbers, respectively. As such, we’ll need
type-level functions for those:
type family Inc (ns :: [Bit]) :: [Bit] where
Inc '[] = '[I]
Inc (O : xs) = I : xs
Inc (I : xs) = O : Inc xs
And now the cons
function:
cons :: a -> Seq ns a -> Seq (Inc ns) a
Nil = Odd x Nil
cons x Even xs) = Odd x xs
cons x (Odd y ys) = Even (cons (x,y) ys) cons x (
However, we’re going to run into trouble if we try to write
uncons
:
type family Dec (ns :: [Bit]) :: [Bit] where
Dec (I : xs) = O : xs
Dec (O : xs) = I : Dec xs
Dec '[] = ???
uncons :: Seq ns a -> (a, Seq (Dec ns) a)
Odd x xs) = (x, Even xs)
uncons (Even xs) = case uncons xs of
uncons (-> (x, Odd y ys)
((x,y),ys) Nil = ??? uncons
We should be able to write this function without returning a
Maybe
. Because we statically know the size, we can encode
“only nonempty sequences”. The problem is that Seq [] a
isn’t the only non-empty sequence: there’s also Seq [O] a
and Seq [O,O] a
, and so on. Our binary number system is
redundant, because it contains trailing zeroes.
We could add some kind of proof into the data structure, but that would (again) be expensive. Instead, we can make the index itself correct-by-construction, by choosing a non-redundant representation of binary numbers.
Here’s the trick: instead of having a list of bits, we’re going to have a list of “the distance to the next one”. This eliminates the redundancy, and translates into our data structure like so:
data N = Z | S N
data Nest n ns a where
Odd :: a -> (Seq ns (a,a)) -> Nest Z ns a
Even :: (Nest n ns (a,a)) -> Nest (S n) ns a
data Seq ns a where
Nil :: Seq '[] a
Cons :: Nest n ns a -> Seq (n : ns) a
Lovely! Crucially for our uncons
, we now know that any
non-empty list of bits is a non-zero list of bits, so we can type
“nonempty sequence” easily:
type family Dec (n :: N) (ns :: [N]) = (r :: [N]) | r -> n ns where
Dec (S n) ns = Z : Dec n ns
Dec Z '[] = '[]
Dec Z (n : ns) = S n : ns
uncons :: Seq (n : ns) a -> (a, Seq (Dec n ns) a)
Cons xs') = go xs'
uncons (where
go :: Nest n ns a -> (a, Seq (Dec n ns) a)
Odd x Nil) = (x, Nil)
go (Odd x (Cons xs)) = (x, Cons (Even xs))
go (Even xs) = case go xs of ((x,y),ys) -> (x, Cons (Odd y ys)) go (
We’re still not done, though: here’s our new type family for incrementing things.
type family Inc (ns :: [N]) :: [N] where
Inc '[] = '[Z]
Inc (S n : ns) = Z : n : ns
Inc (Z : ns) = Carry (Inc ns)
type family Carry (ns :: [N]) :: [N] where
Carry '[] = '[]
Carry (n : ns) = S n : ns
The Carry
there is ugly, and that ugliness carries into
the cons
function:
cons :: a -> Seq ns a -> Seq (Inc ns) a
Nil = Cons (Odd x Nil)
cons x Cons xs') = go x' xs'
cons x' (where
go :: a -> Nest n ns a -> Seq (Inc (n:ns)) a
Even xs) = Cons (Odd x (Cons xs))
go x (Odd y Nil) = Cons (Even (Odd (x,y) Nil))
go x (Odd y (Cons ys)) = carry (go (x,y) ys)
go x (
carry :: Seq ns (a,a) -> Seq (Carry ns) a
Nil = Nil
carry Cons xs) = Cons (Even xs) carry (
To clean it up, we’re going to use another technique.
Technique 4: Provide Information on Indices as Early as Possible
You occasionally see people wonder about the usual definition of addition on Peano numbers:
_+_ : ℕ → ℕ → ℕ
= m
zero + m = suc (n + m) suc n + m
It’s very simple, with only two equations. When someone sees the following error, then:
couldn't match type n with n + 0
They might be tempted to add it as an equation to the function:
_+_ : ℕ → ℕ → ℕ
= m
zero + m = n
n + zero = suc (n + m) suc n + m
Similarly, when someone sees the other error commonly found with :
couldn't match type S n + m with n + S m
They’ll add that equation in too! In fact, that particular equation will provide a valid definition of :
_+_ : ℕ → ℕ → ℕ
= m
zero + m = n + suc m suc n + m
So why is the first definition of + the one almost always used?
Because it maximizes output information from minimal input.
Take the second implementation above, the one with the zero on the
right. In this function, we have to look at the second argument in the
second clause: in other words, we don’t get to find out about the output
until we’ve looked at both n
and m
. In the
usual definition, if you know the first argument is suc
something, you also know the output must be suc
something.
Similarly with the third implementation: we have to examine the first argument in its entirety before we wrap the output in a constructor. Yes, we can of course prove that they’re all equivalent, but remember: proofs are expensive, and we’re looking for speed here. So the first definition of is our best bet, since it tells us the most without having to prove anything.
Looking back at our definition of Inc
, we can actually
provide more information a little sooner:
type family Inc (ns :: [N]) :: [N] where
Inc '[] = '[Z]
Inc (S n : ns) = Z : n : ns
Inc (Z : ns) = Carry (Inc ns)
In all of the outputs, the list is non-empty. We can encode that, by having two different functions for the head and tail of the list:
type family IncHead (ns :: [N]) :: N where
IncHead '[] = Z
IncHead (n : ns) = IncHead' n ns
type family IncHead' (n :: N) (ns :: [N]) :: N where
IncHead' (S n) ns = Z
IncHead' Z ns = S (IncHead ns)
type family IncTail (ns :: [N]) :: [N] where
IncTail '[] = '[]
IncTail (n : ns) = IncTail' n ns
type family IncTail' (n :: N) (ns :: [N]) :: [N] where
IncTail' (S n) ns = n : ns
IncTail' Z ns = IncTail ns
type Inc (ns :: [N]) = IncHead ns : IncTail ns
This tells the typechecker that we’re not returning an empty sequence right away, so we don’t have to pattern-match to prove it later, giving us a more efficient function.
cons :: a -> Seq ns a -> Seq (Inc ns) a
= Cons (go x' xs')
cons x' xs' where
go :: a -> Seq ns a -> Nest (IncHead ns) (IncTail ns) a
Nil = Odd x Nil
go x Cons (Even xs)) = Odd x (Cons xs)
go x (Cons (Odd y ys)) = Even (go (x,y) ys) go x (
Technique 5: Lazy Proofs
Briefly after introducing the binary random-access list, Okasaki
describes the skew-binary random-access list. As well as having
the same indexing cost as the type above, it supports
cons
. But wait—didn’t the previous structure have
cons
? Not really. Unfortunately, in a pure functional
setting, imperative-style amortization measurements aren’t always valid.
Say we perform a cons
in the worst case, and it takes
time. In an imperative setting, that’s no problem, because all of the
rest of the operations are not going to be on the worst-case. In a pure
setting, though, the old structure is still sitting around. You can
still access it, and you can still get that awful worst-case time.
This is where the skew binary tree comes in. It’s based on the skew binary numbers: these work similarly to binary, but you’re allowed have (at most) a single 2 digit before any ones. This gives you incrementing and decrementing, which is what we need here. Let’s get started.
First, our type-level numbers. We’re going to use the sparse encoding
as above, but we need some way to encode “you’re only allowed one 2”.
The most lightweight way to do it I can think of is by implicitly
assuming the second number in the list of gaps is one less than the
others. In other words, we encode a 2 with [n, 0, m]
. That
0
means that at position n
there’s a 2, not a
1.
The corresponding type families for increment and decrement are clearly :
type family Inc (ns :: [N]) = (ms :: [N]) | ms -> ns where
Inc '[] = Z : '[]
Inc (x : '[]) = Z : x : '[]
Inc (x : Z : xs) = S x : xs
Inc (x1 : S x2 : xs) = Z : x1 : x2 : xs
type family Dec (n :: N) (ns :: [N]) = (ms :: [N]) | ms -> n ns where
Dec (S x) xs = x : Z : xs
Dec Z '[] = '[]
Dec Z (x : '[]) = x : '[]
Dec Z (x1 : x2 : xs) = x1 : S x2 : xs
We don’t need to split this into head and tail families as we did before because there’s no recursive call: we know all we’re ever going to know about the output following any match on the input.
There’s another problem before we write the implementation: we can’t
use the Nest
construction that we had before, because then
the head would be buried in
constructors (or thereabouts). Instead, we’re going to have to use GADTs
to encode the “gap” type, alongside the relevant tree. This gap type is
going to be very similar to the
proof we had for the modular counters, but with an extra parameter:
data Gap (n :: N) (g :: N) (m :: N) where
Zy :: Gap n Z n
Sy :: Gap n g m -> Gap n (S g) (S m)
Gap n g m
means there is a gap of g
between
n
and m
. Or, stated another way, it means
n + g = m
. Its inductive structure mimics the
g
parameter (it’s basically the g
parameter
itself with some added information).
With all of that together, here’s the definition of the array itself:
type family Tree (n :: N) (a :: Type) where
Tree Z a = a
Tree (S n) a = Node n a
data Node n a = Node a (Tree n a) (Tree n a)
data SeqTail (n :: N) (ns :: [N]) (a :: Type) where
NilT :: SeqTail n '[] a
ConsT :: Gap n g m
-> Tree m a
-> SeqTail (S m) ms a
-> SeqTail n (g : ms) a
data Seq (ns :: [N]) (a :: Type) where
Nil :: Seq '[] a
Cons :: Gap Z g n
-> Tree n a
-> SeqTail n ns a
-> Seq (g : ns) a
The cons
operation again mimics the increment function,
but there’s one final snag before it’ll typecheck:
cons :: a -> Seq ns a -> Seq (Inc ns) a
Nil = Cons Zy x NilT
cons x Cons zn y NilT) = Cons Zy x (ConsT zn y NilT)
cons x (Cons zn y1 (ConsT Zy y2 ys)) = Cons(Sy zn) (Node x y1 y2) ys
cons x (Cons zn y1 (ConsT (Sy nm) y2 ys)) =
cons x (Cons Zy x (ConsT zn y1 (ConsT ??? y2 ys))
On the final line, the ???
is missing. In the unverified
version, nm
would slot right in there. Here, though, if we
try it we get an error, which basically amounts to:
Gap n g m /= Gap (S n) g (S m)
At this point, I’d usually throw out the inductive-style proof, and replace it with a proof of equality, which I’d aggressively erase in all of the functions. I said at the beginning I wouldn’t cheat, though, so here’s what I’ll do instead:
gapr :: Gap n g m -> Gap (S n) g (S m)
Zy = Zy
gapr Sy pnm) = Sy (gapr pnm)
gapr (
cons :: a -> Seq ns a -> Seq (Inc ns) a
Nil = Cons Zy x NilT
cons x Cons zn y NilT) = Cons Zy x (ConsT zn y NilT)
cons x (Cons zn y1 (ConsT Zy y2 ys)) = Cons (Sy zn) (Node x y1 y2) ys
cons x (Cons zn y1 (ConsT (Sy nm) y2 ys)) =
cons x (Cons Zy x (ConsT zn y1 (ConsT (gapr nm) y2 ys))
At first glance, we’ve lost the complexity bounds. That
gapr
operation is
(or something), and we’re performing it pretty frequently. We might keep
the amortized bounds, but isn’t that not really worthy in a pure
setting?
That would all be true, if it weren’t for laziness. Because we
delay the evaluation of gapr
, we won’t have to pay
for it all in one big thunk. In fact, because it’s basically a unary
number, we only have to pay for one part of it at a time. I haven’t yet
fully worked out the proofs, but I’m pretty sure we’re guaranteed
worst-case time here too.
Technique 6: When All Else Fails, Prove it Later
About a year ago, I tried to write a verified version of binomial heaps, which could then be used for sorting traversable containers. Unfortunately, I couldn’t figure out how to write delete-min, and gave up. I did recognize that the redundancy of the binary representation was a problem, but I couldn’t figure out much more than that.
Now, though, we have a new non-redundant representation of binary numbers, and some handy techniques to go along with it.
Unfortunately, I ran into a similar roadblock in the implementation. Here’s the point where I was stuck:
data Zipper a n xs = Zipper a (Node n a) (Binomial n xs a)
slideLeft :: Zipper a (S n) xs -> Zipper a n (Z : xs)
Zipper m (t :< ts) hs) = Zipper m ts (Cons (Odd t hs))
slideLeft (
minView :: Ord a => Binomial n (x : xs) a -> (a, Binomial n (Decr x xs) a)
Cons xs') = unZipper (go xs')
minView (where
Zipper x _ xs) = (x, xs)
unZipper (
go :: forall a n x xs. Ord a => Nest n x xs a -> Zipper a n (Decr x xs)
Even xs) = slideLeft (go xs)
go (Odd (Root x ts) Empty) = Zipper x ts Empty
go (Odd c@(Root x ts) (Cons xs)) =
go (case go xs of
Zipper m (t' :< _) hs)
(| m >= x -> Zipper x ts (Cons (Even xs))
| otherwise ->
Zipper m ts
case hs of
(Empty -> Cons (Even (Odd (mergeTree c t') Empty))
Cons hs' -> Cons (Even (carryOneNest (mergeTree c t') hs')))
The last two lines don’t typecheck! The errors were complex, but effectively they stated:
Could not deduce
x : xs ~ [Z]
from the context
Decr x xs ~ []
and:
Could not deduce
x : xs ~ Inc (y : ys)
from the context
Decr x xs ~ y : ys
The thing is, all of those look pretty provable. So, for this
technique, we first figure out what proofs we need, and assume
we have them. This means changing minView
to the
following:
data Zipper a n xs = Zipper a (Node n a) (Binomial n xs a)
slideLeft :: Zipper a (S n) xs -> Zipper a n (Z : xs)
Zipper m (t :< ts) hs) = Zipper m ts (Cons (Odd t hs))
slideLeft (
minView :: Ord a => Binomial n (x : xs) a -> (a, Binomial n (Decr x xs) a)
Cons xs') = unZipper (go xs')
minView (where
Zipper x _ xs) = (x, xs)
unZipper (
go :: forall a n x xs. Ord a => Nest n x xs a -> Zipper a n (Decr x xs)
Even xs) = slideLeft (go xs)
go (Odd (Root x ts) Empty) = Zipper x ts Empty
go (Odd c@(Root x ts) (Cons xs)) =
go (case go xs of
Zipper m (t' :< _) (hs :: Binomial (S n) (Decr y ys) a))
(| m >= x -> Zipper x ts (Cons (Even xs))
| otherwise ->
Zipper m ts
case hs of
(Empty -> gcastWith (lemma1 @y @ys Refl)
Cons (Even (Odd (mergeTree c t') Empty))
Cons hs' -> gcastWith (lemma2 @y @ys Refl)
Cons (Even (carryOneNest (mergeTree c t') hs')))
And writing in the templates for our lemmas:
lemma1 :: forall x xs. Decr x xs :~: '[] -> x : xs :~: Z : '[]
= _
lemma1
lemma2 :: forall x xs y ys. Decr x xs :~: y : ys -> x : xs :~: Inc (y : ys)
= _ lemma2
We now need to provide the implementations for
lemma1
and lemma2
. With this approach, even if
we fail to do the next steps, we can cop out here and sub in
unsafeCoerce Refl
in place of the two proofs, maintaining
the efficiency. We won’t need to, though!
Unlike in Agda, the types for those proofs won’t be around at runtime, so we won’t have anything to pattern match on. We’ll need to look for things in the surrounding area which could act like singletons for the lemmas.
It turns out that the xs
and hs'
floating
around can do exactly that: they tell us about the type-level
y
and x
. So we just pass them to the lemmas
(where they’re needed). This changes the last 4 lines of
minView
to:
Empty -> gcastWith (lemma1 Refl xs)
Cons (Even (Odd (mergeTree c t') Empty))
Cons hs' -> gcastWith (lemma2 Refl xs hs')
Cons (Even (carryOneNest (mergeTree c t') hs'))
Now, we just have to fill in the lemmas! If we were lucky, they’d actually be constant-time.
lemma1 :: forall x xs n a. Decr x xs :~: '[]
-> Nest n x xs a
-> x : xs :~: Z : '[]
Refl (Odd _ Empty) = Refl
lemma1
lemma2 :: forall x xs y ys n a.
Decr x xs :~: y : ys
-> Nest n x xs a
-> Nest n y ys a
-> x : xs :~: Inc (y : ys)
Refl (Even (Odd _ Empty)) (Odd _ Empty) = Refl
lemma2 Refl (Odd _ (Cons _)) (Even _) = Refl
lemma2 Refl (Even xs) (Odd _ (Cons ys)) =
lemma2 Refl xs ys) Refl gcastWith (lemma2
If they had been constant-time, that would have let us throw them out: each proof would essentially show you what cases needed to be scrutinized to satisfy the typechecker. You then just scrutinize those cases in the actual function, and it should all typecheck.
As it is, lemma2
is actually ok. It does cost
,
but so does carryOneNest
: we’ve maintained the complexity!
We could stop here, satisfied.
There’s another option, though, one that I picked up from Stephanie Weirich’s talk (2017): you thread the requirement through the function as an equality constraint. It won’t always work, but when your function’s call graph matches that of the proof, the constraint will indeed be satisfied, with no runtime cost. In this case, we can whittle down the proof obligation to the following:
Inc (Decr x xs) ~ (x : xs)
Now we change the recursive go
into continuation-passing
style, and add that constraint to its signature, and everything
works!
minView :: Ord a => Binomial n (x : xs) a -> (a, Binomial n (Decr x xs) a)
Cons xs') = go xs' \(Zipper x _ xs) -> (x,xs)
minView (where
go :: Ord a
=> Nest n x xs a
-> (Inc (Decr x xs) ~ (x : xs) => Zipper a n (Decr x xs) -> b) -> b
Even xs) k = go xs \(Zipper m (t :< ts) hs) -> k (Zipper m ts (Cons (Odd t hs)))
go (Odd (Root x ts) Empty) k = k (Zipper x ts Empty)
go (Odd c@(Root x cs) (Cons xs)) k =
go (
go xscase
\Zipper m _ _ | m >= x ->
Zipper x cs (Cons (Even xs)))
k (Zipper m (t :< ts) Empty ->
Zipper m ts (Cons (Even (Odd (mergeTree c t) Empty))))
k (Zipper m (t :< ts) (Cons hs) ->
Zipper m ts (Cons (Even (carryOneNest (mergeTree c t) hs)))) k (
Conclusion
As I mentioned in the beginning, a huge amount of this stuff is much easier using other systems. On top of that, there’s currently a lot of work being done on dependent type erasure, so that proofs like the above don’t even exist at runtime. In other words, there’s a chance that all of these techniques will soon be useless!
Efficient proof-carrying code makes for an interesting puzzle, though, even if it is a bit of a hair shirt.
Code
Fuller implementations of the structures here are in this git repository.