Finger Trees in Agda
This Post is Available With Clickable Code Here
This whole post is written with clickable identifiers and ascii art at the above link. I also provide the normal version below in case there are any problems rendering.
As I have talked about previously, a large class
of divide-and conquer algorithms rely on “good” partitioning for the
divide step. If you then want to make the algorithms incremental, you
keep all of those partitions (with their summaries) in some “good”
arrangement (Mu,
Chiang, and Lyu 2016). Several common data structures are
designed around this principle: binomial heaps, for instance, store
partitions of size
.
Different ways of storing partitions favours different use cases: switch
from a binomial heap to a skew binomial, for instance, and you get
constant-time cons
.
The standout data structure in this area is Hinze and Paterson’s
finger tree (Hinze and Paterson 2006).
It caches summaries in a pretty amazing way, allowing for (amortised)
cons
and snoc
and
split
and append
. These features allow it to
be used for a huge variety of things: Data.Sequence
uses it as a random-access sequence, but it can also work as a priority
queue, a search tree, a priority search tree (Hinze 2001), an interval tree, an
order statistic tree…
All of these applications solely rely on an underlying monoid. As a result, I thought it would be a great data structure to implement in Agda, so that you’d get all of the other data structures with minimal effort [similar thinking motivated a Coq implementation; Sozeau (2007)].
Scope of the Verification
There would be no real point to implementing a finger tree in Agda if
we didn’t also prove some things about it. The scope of the proofs I’ve
done so far are intrinsic proofs of the summaries in the tree. In other
words, the type of cons
is as follows:
: ∀ x {xs} → Tree xs → Tree (μ x ∙ xs) cons
This is enough to prove things about the derived data structures (like the correctness of sorting if it’s used as a priority queue), but it’s worth pointing out what I haven’t proved (yet):
- Invariants on the structure (“safe” and “unsafe” digits and so on).
- The time complexity or performance of any operations.
To be honest, I’m not even sure that my current implementation is correct in these regards! I’ll probably have a go at proving them in the future (possibly using Danielsson 2008).
Monoids and Proofs
The bad news is that finger trees are a relatively complex data structure, and we’re going to need a lot of proofs to write a verified version. The good news is that monoids (in contrast to rings) are extremely easy to prove automatically. In this project, I used reflection to do so, but I think it should be possible to do with instance resolution also.
Measures
First things first, we need a way to talk about the summaries of elements we’re interested in. This is captured by the following record type:
record σ {a} (Σ : Set a) : Set (a ⊔ r) where
field
: Σ → 𝓡
μ
open σ ⦃ ... ⦄
𝓡
is the type of the summaries, and μ
means
“summarise”. The silly symbols are used for brevity: we’re going to be
using this thing everywhere, so it’s important to keep it short. Here’s
an example instance for lists:
instance
: ∀ {a} {Σ : Set a} → ⦃ _ : σ Σ ⦄ → σ (List Σ)
σ-List = List.foldr (_∙_ ∘ μ) ε μ ⦃ σ-List ⦄
Working With Setoids
As I mentioned, the tree is going to be verified intrinsically. In other word its type will look something like this:
: 𝓡 → Set Tree
But before running off to define that the obvious way, I should mention that I made the annoying decision to use a setoid (rather than propositional equality) based monoid. This means that we don’t get substitution, making the obvious definition untenable.
I figured out a solution to the problem, but I’m not sure if I’m happy with it. That’s actually the main motivation for writing this post: I’m curious if other people have better techniques for this kind of thing.
To clarify: “this kind of thing” is writing intrinsic
(correct-by-construction) proofs when a setoid is involved. Intrinsic
proofs usually lend themselves to elegance: to prove that
map
preserves a vector’s length, for instance, basically
requires no proof at all:
: ∀ {a b n} {A : Set a} {B : Set b}
map → (A → B)
→ Vec A n
→ Vec B n
= []
map f [] (x ∷ xs) = f x ∷ map f xs map f
But that’s because pattern matching works well with propositional
equality: in the first clause, n
is set to 0
automatically. If we were working with setoid equality, we’d instead
maybe get a proof that n ≈ 0
, and we’d have to figure a way
to work that into the types.
Fibres
The first part of the solution is to define a wrapper type which stores information about the size of the thing it contains:
record μ⟨_⟩≈_ {a} (Σ : Set a) ⦃ _ : σ Σ ⦄ (𝓂 : 𝓡) : Set (a ⊔ r ⊔ m) where
constructor _⇑[_]
field
: Σ
𝓢 : μ 𝓢 ≈ 𝓂 𝒻
Technically speaking, I think this is known as a “fibre”.
μ⟨ Σ ⟩≈ 𝓂
means “There exists a Σ
such that
μ Σ ≈ 𝓂
”. Next, we’ll need some combinators to work
with:
infixl 2 _≈[_]
_≈[_] : ∀ {a} {Σ : Set a} ⦃ _ : σ Σ ⦄ {x : 𝓡} → μ⟨ Σ ⟩≈ x → ∀ {y} → x ≈ y → μ⟨ Σ ⟩≈ y
(xs ≈[ y≈z ]) = 𝓢 xs
𝓢 (xs ≈[ y≈z ]) = trans (𝒻 xs) y≈z 𝒻
This makes it possible to “rewrite” the summary, given a proof of equivalence.
Do Notation
The wrapper on its own isn’t enough to save us from hundreds of lines of proofs. Once you do computation on its contents, you still need to join it up with its original proof of equivalence. In other words, you’ll need to drill into the return type of a function, find the place you used the relevant type variable, and apply the relevant proof from the type above. This can really clutter proofs. Instead, we can use Agda’s new support for do notation to try and get a cleaner notation for everything. Here’s a big block of code:
infixl 2 arg-syntax
record Arg {a} (Σ : Set a) ⦃ _ : σ Σ ⦄ (𝓂 : 𝓡) (f : 𝓡 → 𝓡) : Set (m ⊔ r ⊔ a) where
constructor arg-syntax
field
: Congruent₁ f
⟨f⟩ : μ⟨ Σ ⟩≈ 𝓂
arg open Arg
syntax arg-syntax (λ sz → e₁) xs = xs [ e₁ ⟿ sz ]
infixl 1 _>>=_
_>>=_ : ∀ {a b} {Σ₁ : Set a} {Σ₂ : Set b} ⦃ _ : σ Σ₁ ⦄ ⦃ _ : σ Σ₂ ⦄ {𝓂 f}
→ Arg Σ₁ 𝓂 f
→ ((x : Σ₁) → ⦃ x≈ : μ x ≈ 𝓂 ⦄ → μ⟨ Σ₂ ⟩≈ f (μ x))
→ μ⟨ Σ₂ ⟩≈ f 𝓂
= k (𝓢 xs) ⦃ 𝒻 xs ⦄ ≈[ cng (𝒻 xs) ] arg-syntax cng xs >>= k
First, we define a wrapper for types parameterised by their summary,
with a way to lift an underlying equality up into some expression
f
. The >>=
operator just connects up all
of the relevant bits. An example is what’s needed:
: ∀ {a} {Σ : Set a} ⦃ _ : σ Σ ⦄ → (xs : List Σ) → μ⟨ Tree Σ ⟩≈ μ xs
listToTree = empty ⇑
listToTree [] (x ∷ xs) = [ ℳ ↯ ]≈ do
listToTree
ys ← listToTree xs [ μ x ∙> s ⟿ s ] x ◂ ys
The first line is the base case, nothing interesting going on there.
The second line begins the do-notation, but first applies
[ ℳ ↯ ]≈
: this calls the automated solver. The second line
makes the recursive call, and with the syntax:
[ μ x ∙> s ⟿ s ]
It tells us where the size of the bound variable will end up in the outer expression.