Verified AVL Trees in Haskell and Agda
I’ve been writing a lot of Agda recently, and had the occasion to write a Fenwick tree that did some rebalancing. I went with AVLstyle rebalancing (rather than redblack or trees of bounded balance). I’d written pretty full implementations of the other two before, and the Agda standard library (Danielsson 2018) has an implementation already that I was able to use as a starting point. Also, apparently, AVL trees seem to perform better than redblack trees in practice (Pfaff 2004).
This post will be similar in style to Stephanie Weirich’s talk (2014), which compares an Agda implementation of verified redblack trees to a Haskell one. When there’s two columns of code sidebyside, the lefthand side is Haskell, the right Agda.
The method of constructing the ordering proof is taken from “How to Keep Your Neighbours in Order” (2014) by Conor McBride; the structural proofs are somewhat inspired by the implementation in the Agda standard library, but are mainly my own.
Height
AVL trees are more strictly balanced than redblack trees: the height of neighboring subtrees can differ by at most one. To store the height, we will start as every dependentlytyped program does: with Peano numbers.
Haskell
data N = Z  S N
Agda
data ℕ : Set where
: ℕ
zero : ℕ → ℕ suc
The trees will be balanced one of three possible ways: leftheavy, rightheavy, or even. We can represent these three cases in a GADT in the case of Haskell, or an indexed datatype in the case of Agda:
data Balance :: N > N > N > Type where
L :: Balance (S n) n (S n)
O :: Balance n n n
R :: Balance n (S n) (S n)
data ⟨_⊔_⟩≡_ : ℕ → ℕ → ℕ → Set where
: ∀ {n} → ⟨ suc n ⊔ n ⟩≡ suc n
◿ : ∀ {n} → ⟨ n ⊔ n ⟩≡ n
▽ : ∀ {n} → ⟨ n ⊔ suc n ⟩≡ suc n ◺
Those unfamiliar with Agda might be a little intimidated by the mixfix operator in the balance definition: we’re using it here because the type can be seen of a proof that:
$max(x,y) = z$
Or, using the $\sqcup$ operator:
$(x \sqcup y) = z$
We’ll use this proof in the tree itself, as we’ll need to know the maximum of the height of a node’s two subtrees to find the height of the node. Before we do that, we’ll need a couple helper functions for manipulating the balance:
balr :: Balance x y z > Balance z x z
L = O
balr O = O
balr R = L
balr
ball :: Balance x y z > Balance y z z
L = R
ball O = O
ball R = O ball
: ∀ {x y z} → ⟨ x ⊔ y ⟩≡ z → ⟨ z ⊔ x ⟩≡ z
⃕ = ▽
⃕ ◿ = ▽
⃕ ▽ = ◿
⃕ ◺
: ∀ {x y z} → ⟨ x ⊔ y ⟩≡ z → ⟨ y ⊔ z ⟩≡ z
⃔ = ◺
⃔ ◿ = ▽
⃔ ▽ = ▽ ⃔ ◺
Ordering
Along with the verification of the structure of the tree, we will also want to verify that its contents are ordered correctly. Unfortunately, this property is a little out of reach for Haskell, but it’s 100% doable in Agda. First, we’ll need a way to describe orders on a data type. In Haskell, we might write:
class Ord a where
(==) :: a > a > Bool
(<) :: a > a > Bool
That Bool
throws
away any information gained in the comparison, though: we want to supply
a proof with the result of the comparison. First, equality:
4 ==
infix data (==) :: Type
> Type
> Type where
Refl :: x == x
infix 4 _≡_
data _≡_ {a} {A : Set a}
(x : A)
: A → Set a where
: x ≡ x refl
This is one of the many ways to describe equality in Agda. It’s a type with only one constructor, and it can only be constructed when its two arguments are the same. When we pattern match on the constructor, then, we’re given a proof that whatever things those arguments refer to must be the same.
Next, we need to describe an order. For this, we’ll need two types: the empty type, and the unit type.
data Void :: Type where
data Unit :: Type where Unit :: Unit
data ⊥ : Set where
data ⊤ : Set where ⟨⟩ : ⊤
These are kind of like typelevel Bools, with one extra, powerful
addition: they keep their proof after construction. Because ⊥
has no constructors, if someone tells
you they’re going to give you one, you can be pretty sure they’re lying.
How do we use this? Well, first, on the numbers:
type family (n :: N) < (m :: N)
:: Type where
< Z = Void
x Z < S y = Unit
S x < S y = x < y
_ℕ<_ : ℕ → ℕ → Set
= ⊥
x ℕ< zero = ⊤
zero ℕ< suc y = x ℕ< y suc x ℕ< suc y
Therefore, if we ask for something of type x ℕ< y
(for some x
and
y
), we know that it only exists when x
really
is less than y
(according to the definition above).
For our actual code, we’ll parameterize the whole thing over some abstract key type. We’ll do this using a module (a feature recently added to Haskell, as it happens). That might look something like this:
Key where
signature import Data.Kind
data Key
type family (n :: Key) < (m :: Key)
:: Type
module AVL where
import Key
module AVL
{k r} (Key : Set k)
{_<_ : Rel Key r}
(isStrictTotalOrder
: IsStrictTotalOrder _≡_ _<_)
where
open IsStrictTotalOrder isStrictTotalOrder
(the k
and r
here, as well as the Lift
ing noise below, are to do with
Agda’s universe system, which I’ll try explain in a bit)
Now, the trick for the ordering is to keep a proof that two neighboring values are ordered correctly in the tree at each leaf (as there’s a leaf between every pair of nodes, this is exactly the place you should store such a proof). A problem arises with the extremal leaves in the tree (leftmost and rightmost): each leaf is missing one neighboring value, so how can it store a proof of order? The solution is to affix two elements to our key type which we define as the greatest and least elements of the set.
data Bound = LB  IB Key  UB
4 <:
infix
type family (x :: Bound) <: (y :: Bound)
:: Type where
LB <: LB = Void
LB <: UB = Unit
LB <: IB _ = Unit
UB <: _ = Void
IB _ <: LB = Void
IB _ <: UB = Unit
IB x <: IB y = x < y
infix 5 [_]
data [∙] : Set k where
: [∙]
⌊⌋ ⌈⌉ _] : (k : Key) → [∙]
[
infix 4 _[<]_
_[<]_ : [∙] → [∙] → Set r
= Lift r ⊥
⌊⌋ [<] ⌊⌋ = Lift r ⊤
⌊⌋ [<] ⌈⌉ _ ] = Lift r ⊤
⌊⌋ [<] [ _ = Lift r ⊥
⌈⌉ [<] _ ] [<] ⌊⌋ = Lift r ⊥
[ _ ] [<] ⌈⌉ = Lift r ⊤
[ = x < y [ x ] [<] [ y ]
The Tree Type
After all that, we can get bring back Haskell into the story, and define or tree types:
data Tree :: N
> Type
> Type
> Type where
Leaf :: Tree Z k v
Node :: k
> v
> Balance lh rh h
> Tree lh k v
> Tree rh k v
> Tree (S h) k v
data Tree {v}
(V : Key → Set v)
(l u : [∙]) : ℕ →
Set (k ⊔ v ⊔ r) where
: (l<u : l [<] u) → Tree V l u 0
leaf : ∀ {h lh rh}
node (k : Key)
(v : V k)
(bl : ⟨ lh ⊔ rh ⟩≡ h)
(lk : Tree V l [ k ] lh)
(ku : Tree V [ k ] u rh) →
(suc h) Tree V l u
The two definitions are similar, but have a few obvious differences. The Agda version stores the ordering proof at the leaves, as well as the bounds as indices. Its universe is also different: briefly, universes are one of the ways to avoid Russell’s paradox when you’re dealing with dependent types.
In normal, standard Haskell, we think of types as things that describe values (how quaint!). When you’ve got a list, everything in the list has the same type, and that is good and right.
These days, though, we’re not so constrained:
infixr 5 :
data List xs where
Nil :: List '[]
(:) :: x
> List xs
> List (x : xs)
infixr 5 _፦_
data List′ : List Set → Set where
: List′ []
nil _፦_ : ∀ {x xs}
→ x
→ List′ xs
→ List′ (x ∷ xs)
This can quite happily store elements of different types:
example :: List [Bool, String, Integer]
= True : "true" : 1 : Nil example
: List′ (Bool ∷ String ∷ ℕ ∷ [])
example = true ፦ "true" ፦ 1 ፦ nil example
And look at that bizarrelooking list on the wrong side of “::
”! Types
aren’t just describing values, they’re acting like values themselves.
What type does [Bool, String, Integer]
even have, anyway? Why, [Type]
of
course!
So we see that types can be put in lists, and types have types: the natural question then is:
Type :: ???
Set : ???
And this is where Haskell and Agda diverge: in Haskell, we say Type :: Type
(as the old extension TypeInType
implied), and that’s that. From a certain point of view, we’ve opened
the door to Russell’s paradox (we’ve allowed a set to be a member of
itself). This isn’t an issue in Haskell, though, as the typelevel
language was already inconsistent.
Agda goes another way, saying that Set
(Agda’s
equivalent for Type
) has the
type Set₁
, and Set₁
has the type Set₂
, and so on^{1}.
These different sets are called “universes” and their numbers “levels”.
When we write k ⊔ v ⊔ r
, we’re
saying we want to take the greatest universe level from those three
possible levels: the level of the key, the value, and the relation,
respectively.
Type :: Type
Set : Set₁
Rotations
AVL trees maintain their invariants through relatively simple rotations. We’ll start with the right rotation, which fixes an imbalance of two on the left. Because the size of the tree returned might change, we’ll need to wrap it in a datatype:
data (++?) :: (N > Type)
> (N > Type)
where
Stay :: t n > t ++? n
Incr :: t (S n) > t ++? n
_1?+⟨_⟩ : ∀ {𝓁} (T : ℕ → Set 𝓁) → ℕ → Set 𝓁
= ∃[ inc? ] T (if inc?
T 1?+⟨ n ⟩
then suc n)
else n
pattern 0+_ tr = false , tr
pattern 1+_ tr = true , tr
We could actually have the Agda definition be the same as Haskell’s,
it doesn’t make much difference. I’m mainly using it here to demonstrate
dependent pairs in Agda. The first member of the pair is just a boolean
(increased in height/not increased in height). The second member is a
tree whose height depends on the actual value of the boolean.
The ∃
business is just a fancy
syntax; it also waggles its eyebrows at the way a (dependent) pair of
type (x , y)
means “There exists an x such that y”.
Using this, we can write the type for rightrotation:
rotr :: k
> v
> Tree (S (S rh)) k v
> Tree rh k v
> Tree k v ++? S (S rh)
: ∀ {lb ub rh v} {V : Key → Set v}
rotʳ → (k : Key)
→ V k
→ Tree V lb [ k ] (suc (suc rh))
→ Tree V [ k ] ub rh
→ Tree V lb ub 1?+⟨ suc (suc rh) ⟩
There are two possible cases, single rotation:
┌a ┌a
┌y┤ y┤> │ ┌b
│ └b
x┤ └x┤ └c └c
Node y yv L a b) c =
rotr x xv (Stay (Node y yv O a (Node x xv O b c))
Node y yv O a b) c =
rotr x xv (Incr (Node y yv R a (Node x xv L b c))
(node y yv ◿ a b) c =
rotʳ x xv (node y yv ▽ a (node x xv ▽ b c))
0+ (node y yv ▽ a b) c =
rotʳ x xv (node y yv ◺ a (node x xv ◿ b c)) 1+
And double:
┌a ┌a
┌y┤ ┌y┤
│ │ ┌b │ └b> z┤
│ └z┤
│ └c │ ┌c
x┤ └x┤ └d └d
Node y yv R a
rotr x xv (Node z zv bl b c)) d =
(Stay (Node z zv O
Node y yv (balr bl) a b)
(Node x xv (ball bl) c d)) (
(node y yv ◺ a
rotʳ x xv (node z zv bl b c)) d =
(node z zv ▽
0+ (node y yv (⃕ bl) a b)
(node x xv (⃔ bl) c d))
I won’t bore you with leftrotation: suffice to say, it’s the opposite of rightrotation.
Insertion
Finally, the main event: insertion. Once the above functions have all been defined, it’s not very difficult, as it happens: by and large, the types guide you to the right answer. Of course, this is only after we decided to use the pivotal pragmatism and balance approach.
insertWith :: Ord k
=> (v > v > v)
> k
> v
> Tree h k v
> Tree k v ++? h
Leaf =
insertWith _ v vc Incr (Node v vc O Leaf Leaf)
Node k kc bl tl tr) =
insertWith f v vc (case compare v k of
LT >
case insertWith f v vc tl of
Stay tl' >
Stay (Node k kc bl tl' tr)
Incr tl' > case bl of
L > rotr k kc tl' tr
O > Incr (Node k kc L tl' tr)
R > Stay (Node k kc O tl' tr)
EQ >
Stay (Node v (f vc kc) bl tl tr)
GT >
case insertWith f v vc tr of
Stay tr' >
Stay (Node k kc bl tl tr')
Incr tr' > case bl of
L > Stay (Node k kc O tl tr')
O > Incr (Node k kc R tl tr')
R > rotl k kc tl tr'
: ∀ {l u h v}
insert {V : Key → Set v}
(k : Key)
→ V k
→ (V k → V k → V k)
→ Tree V l u h
→ l < k < u
→ Tree V l u 1?+⟨ h ⟩
(leaf l<u) (l , u) =
insert v vc f (node v vc ▽ (leaf l) (leaf u))
1+ (node k kc bl tl tr) prf
insert v vc f with compare v k
(node k kc bl tl tr) (l , _)
insert v vc f  tri< a _ _ with insert v vc f tl (l , a)
...  0+ tl′ = 0+ (node k kc bl tl′ tr)
...  1+ tl′ with bl
...  ◿ = rotʳ k kc tl′ tr
...  ▽ = 1+ (node k kc ◿ tl′ tr)
...  ◺ = 0+ (node k kc ▽ tl′ tr)
(node k kc bl tl tr) _
insert v vc f  tri≈ _ refl _ =
(node k (f vc kc) bl tl tr)
0+ (node k kc bl tl tr) (_ , u)
insert v vc f  tri> _ _ c with insert v vc f tr (c , u)
...  0+ tr′ = 0+ (node k kc bl tl tr′)
...  1+ tr′ with bl
...  ◿ = 0+ (node k kc ▽ tl tr′)
...  ▽ = 1+ (node k kc ◺ tl tr′)
...  ◺ = rotˡ k kc tl tr′
Conclusion
Overall, I’ve been enjoying programming in Agda. The things I liked and didn’t like surprised me:
 Editor Support

Is excellent. I use spacemacs, and the whole thing worked pretty seamlessly. Proof search and auto was maybe not as powerful as Idris’, although that might be down to lack of experience (note—as I write this, I see you can enable casesplitting in proof search, so it looks like I was right about my lack of experience). In many ways, it was much better than Haskell’s editor support: personally, I have never managed to get casesplitting to work in my Haskell setup, never mind some of the fancier features that you get in Agda.
It’s worth noting that my experience with Idris is similar: maybe it’s something about dependent types?
Of course, I missed lots of extra tools, like linters, code formatters, etc., but the tight integration with the compiler was so useful it more than made up for it.
Also, I’d implore anyone who’s had trouble with emacs before to give spacemacs a go. It works well outofthebox, and has a system for keybinding discovery that actually works.
 Documentation

Pretty good, considering. There are some missing parts (rewriting and telescopes are both stubs on the documentation site), but there seemed to be more fully workedout examples available online for different concepts when I needed to figure them out.
Now, the thing about a lot of these complaints/commendations (especially with regards to tooling and personal setups) is that people tend to be pretty bad about evaluating how difficult finicky tasks like editor setups are. Once you’ve gotten the hang of some of this stuff, you forget that you ever didn’t. Agda is the second dependentlytyped language I’ve really gone for a deepish dive on, and I’ve been using spacemacs for a while, so YMMV.
One area of the language itself that I would have liked to see more on was irrelevance. Looking back at the definition of the tree type, in the Haskell version there’s no singleton storing the height (the balance type stores all the information we need), which means that it definitely doesn’t exist at runtime. As I understand it, that implies that the type should be irrelevant in the equivalent Agda. However, when I actually mark it as irrelevant, everything works fine, except that missing cases warnings start showing up. I couldn’t figure out why: Haskell was able to infer full case coverage without the index, after all. Equality proof erasure, also: is it safe? Consistent?
All in all, I’d encourage more Haskellers to give Agda a try. It’s fun, interesting, and $\mathcal{Unicode}$!
Further Reading
No “deletion is left as an exercise to the reader” here, no sir! Fuller implementations of both the Haskell and Agda versions of the code here are available: first, a pdf of the Agda code with lovely colours is here. The accompanying repository is here, and the equivalent for the Haskell code is here. Of course, if you would rather read something by someone who knows what they’re talking about, please see the
References
My phrasing is maybe a little confusing here. When
Set
“has the type”Set₁
it means thatSet
is inSet₁
, not the other way around.↩︎