Part 3 of a 3-part series on Balanced Folds

Tags: Haskell

When we started the series, we wanted to find a “better” fold: one that was more balanced than either `foldl`

or `foldr`

(in its placement of parentheses). Both of these are about as unbalanced as you can get:

The first better fold I found was Jon Fairbairn’s simple `treeFold`

:

```
treeFold :: (a -> a -> a) -> a -> [a] -> a
treeFold f = go
where
go x [] = x
go a (b:l) = go (f a b) (pairMap l)
pairMap (x:y:rest) = f x y : pairMap rest
pairMap xs = xs
>>> treeFold (+) 0 [1,2,3]
(0 + 1) + (2 + 3)
```

Already this function was kind of magical: if your binary operator merges two sorted lists, `foldr`

will give you insertion sort, whereas `treeFold`

will give you merge sort; for summing floats, `treeFold`

has a lower error growth than `sum`

. By dividing up the work better, we were able to improve the characteristics of many algorithms automatically. We also saw that it could easily be made parallel:

```
parseq :: a -> b -> b
parseq a b =
runST
(bool (par a b) (seq a b) <$>
unsafeIOToST (liftA2 (>) numSparks getNumCapabilities))
treeFoldParallel :: (a -> a -> a) -> a -> [a] -> a
treeFoldParallel f =
treeFold
(\l r ->
r `parseq` (l `parseq` f l r))
```

In the next post, we saw how we could make the fold incremental, by using binary number representations for data structures. This let us do 2 things: it meant the fold was structurally terminating, so it would pass the termination checker (efficiently) in languages like Agda or Idris, and it meant we could write `scanl`

using the fold. The `scanl`

was also efficient: you could run the fold at any point in $\mathcal{O}(\log n)$ time, and work would be shared between subsequent runs. Effectively, this let us use it to solve greedy optimization problems. We also saw how it was effectively constructing an implicit binomial priority queue under the hood, and how it exploited laziness to get sharing.

I’ve gotten huge mileage out of this fold and the general ideas about it, and today I’m going to show one more use of it. We’re going to improve some of the asymptotics of the data structure presented in Lampropoulos, Spector-Zabusky, and Foner (2017).

The paper opens with the problem:

Suppose you have an urn containing two red balls, four green balls, and three blue balls. If you take three balls out of the urn, what is the probability that two of them are green?

If you were to take just *one* ball out of the earn, calculating the associated probabilities would be easy. Once you get to the second, though, you have to update the previous probability *based on what ball was removed*. In other words, we need to be able to dynamically update the distribution.

Using lists, this would obviously become an $\mathcal{O}(n)$ operation. In the paper, an almost-perfect binary tree is used. This turns the operation into one that’s $\mathcal{O}(\log n)$. The rest of the operations have the following complexities:

Operation | Complexity |
---|---|

`insert` |
$\mathcal{O}(\log n)$ |

`remove` |
$\mathcal{O}(\log n)$ |

`fromList` |
$\mathcal{O}(n)$ |

As a quick spoiler, the improved version presented here has these complexities:

Operation | Complexity |
---|---|

`insert` |
$\mathcal{O}(1)$ |

`remove` |
$\mathcal{O}(\log n)$ |

`merge` |
$\mathcal{O}(\log n)$ |

`fromList` |
$\mathcal{O}(n)$ |

We add another operation (`merge`

), which means that the new structure is viable as an instance of `Alternative`

, `Monad`

, and so on, making it an efficient monad for weighted backtracking search.

The key thing to notice in the paper which will let us improve the structure is that what they’re designing is actually a *priority queue*. Well, a weird looking priority queue, but a priority queue nonetheless.

Think about it like a max-priority queue (pop returns the largest element first), with a degree of “randomization”. In other words, when you go to do a pop, all of the comparisons between the ordering keys (the weights in this case) sprinkles some randomness into the equation, meaning that instead of `1 < 2`

returning `True`

, it returns `True`

$\frac{2}{3}$ of the time, and `False`

the other $\frac{1}{3}$.

This way of doing things means that not every priority queue is suitable: we want to run comparisons at `pop`

time (not `insert`

), so a binary heap (for instance) won’t do. At branches (non-leaves), the queue will only be allowed store *summaries* of the data, not the “max element”.

The one presented in the paper is something like a Braun priority queue: the $\mathcal{O}(n)$ `fromList`

implementation is reminiscent of the one in Okasaki (1997).

So what priority queue can we choose to get us the desired efficiency? Why, a binomial one of course!

The urn structure itself looks a lot like a binomial heap:

```
data Tree a
= Tree
{ weight :: {-# UNPACK #-} !Word
, branch :: Node a
}
data Node a
= Leaf a
| Branch (Tree a) (Node a)
data Heap a
= Nil
| Cons {-# UNPACK #-} !Word (Tree a) (Heap a)
data Urn a =
Urn {-# UNPACK #-} !Word
!(Heap a)
```

By avoiding the usual `Skip`

constructors you often see in a binomial heap we save a huge amount of space. Instead, we store the “number of zeroes before this bit”. Another thing to point out is that only left branches in the trees store their weight: the same optimization is made in the paper.

Insertion is not much different from insertion for a usual binomial priority queue, although we don’t need to do anything to merge the trees:

```
insertHeap :: Word -> a -> Heap a -> Heap a
insertHeap i' x' = go 0 (Tree i' (Leaf x'))
where
go !i x Nil = Cons i x Nil
go !i x (Cons 0 y ys) = go (i+1) (mergeTree x y) ys
go !i x (Cons j y ys) = Cons i x (Cons (j-1) y ys)
mergeTree :: Tree a -> Tree a -> Tree a
mergeTree xs ys =
Tree
(weight xs + weight ys)
(Branch xs (branch ys))
insert :: Word -> a -> Urn a -> Urn a
insert i x (Urn w xs) = Urn (w+i) (insertHeap i x xs)
```

We *could* potentially get insertion from amortized $\mathcal{O}(1)$ to worst-case $\mathcal{O}(1)$ by using skew binary instead of binary (in fact I am almost sure it’s possible), but then I think we’d lose the efficient merge. I’ll leave exploring that for another day.

To get randomness, we’ll write a very simple class that encapsulates only what we need:

You can later instantiate this to whatever random monad you end up using. (The same approach was taken in the paper, although we only require `Functor`

here, not `Monad`

).

Sampling (with replacement) first randomly chooses a tree from the top-level list, and then we drill down into that tree with binary search.

```
sample :: (Functor m, Sample m) => Urn a -> Maybe (m a)
sample (Urn _ Nil) = Nothing
sample (Urn w' (Cons _ x' xs')) = Just (fmap (go x' xs') (inRange 0 (w' - 1)))
where
go x Nil !w = go' w (branch x)
go x (Cons _ y ys) !w
| w < weight x = go' w (branch x)
| otherwise = go y ys (w - weight x)
go' !_ (Leaf x) = x
go' !i (Branch xs ys)
| i < weight xs = go' i (branch xs)
| otherwise = go' (i - weight xs) ys
```

So we’re off to a good start, but `remove`

is a complex operation. We take the same route taken in the paper: first, we perform an “uncons”-like operation, which pops out the last inserted element. Then, we randomly choose a point in the tree (using the same logic as in `sample`

), and replace it with the popped element^{1}.

```
remove :: (Functor m, Sample m) => Urn a -> Maybe (m ((a, Word), Urn a))
remove (Urn w hp) = fmap go' (Heap.uninsert hp)
where
go' (vw,v,hp') = fmap (`go` hp') (inRange 0 (w-1))
where
go !_ Nil = ((v, vw), Urn 0 Nil)
go !rw vs@(Cons i' x' xs')
| rw < vw = ((v, vw), Urn (w - vw) vs)
| otherwise = replace (rw - vw) i' x' xs'
(\ys yw y -> ((y, yw), Urn (w - yw) ys))
replace !rw i x Nil k = replaceTree rw x (\t -> k (Cons i t Nil))
replace !rw i x xs@(Cons j y ys) k
| rw < weight x = replaceTree rw x (\t -> k (Cons i t xs))
| otherwise = replace (rw - weight x) j y ys (k . Cons i x)
replaceTree !_ (Tree tw (Leaf x)) k = k (Tree vw (Leaf v)) tw x
replaceTree !rw (Tree tw (Branch xs ys)) k
| rw < weight xs = replaceTree rw xs
(\t -> k (Tree (tw + (weight t - weight xs)) (Branch t ys)))
| otherwise = replaceTree (rw - weight xs)
(Tree (tw - weight xs) ys)
(\t -> k (Tree (weight xs + weight t) (Branch xs (branch t))))
```

Merge is the same as on binomial heaps:

```
mergeHeap :: Heap a -> Heap a -> Heap a
mergeHeap Nil = id
mergeHeap (Cons i' x' xs') = merger i' x' xs'
where
merger !i x xs Nil = Cons i x xs
merger !i x xs (Cons j y ys) = merge' i x xs j y ys
merge' !i x xs !j y ys = case compare i j of
LT -> Cons i x (merger (j-i-1) y ys xs)
GT -> Cons j y (merger (i-j-1) x xs ys)
EQ -> mergec (succ i) (mergeTree x y) xs ys
mergec !p !t Nil = carryLonger p t
mergec !p !t (Cons i x xs) = mergecr p t i x xs
mergecr !p !t !i x xs Nil = carryLonger' p t i x xs
mergecr !p !t !i x xs (Cons j y ys) = mergec' p t i x xs j y ys
mergec' !p t !i x xs !j y ys = case compare i j of
LT -> mergecr'' p t i x xs (j-i-1) y ys
GT -> mergecr'' p t j y ys (i-j-1) x xs
EQ -> Cons p t (mergec i (mergeTree x y) xs ys)
mergecr'' !p !t 0 x xs !j y ys = mergecr (p+1) (mergeTree t x) j y ys xs
mergecr'' !p !t !i x xs !j y ys = Cons p t (Cons (i-1) x (merger j y ys xs))
carryLonger !i !t Nil = Cons i t Nil
carryLonger !i !t (Cons j y ys) = carryLonger' i t j y ys
carryLonger' !i !t 0 y ys = carryLonger (succ i) (mergeTree t y) ys
carryLonger' !i !t !j y ys = Cons i t (Cons (j-1) y ys)
merge :: Urn a -> Urn a -> Urn a
merge (Urn i xs) (Urn j ys) = Urn (i+j) (mergeHeap xs ys)
```

Again, the cleverness of all the tree folds is that they intelligently batch summarizing operations, allowing you to efficiently so prefix-scan-like operations that exploit sharing.

The bare-bones version just uses binary numbers: you can upgrade the `cons`

operation to worst-case constant-time if you use *skew* binary. Are there other optimizations? Yes! What if we wanted to stick something on to the *other* end, for instance? What if we wanted to reverse?

If you figure out a way to do *all* these optimizations, and put them into one big data structure, you get the mother-of-all “batching” data structures: the finger tree. This is the basis for Haskell’s Data.Sequence, but it can also implement priority queues, urns (I’d imagine), fenwick-tree-like structures, and more.

First and foremost, I should test the above implementations! I’m pretty confident the asymptotics are correct, but I’m certain the implementations have bugs.

The efficient `merge`

is intriguing: it means that `Urn`

could conceivably be `Alternative`

, `MonadPlus`

, etc. I have yet to see a use for that, but it’s interesting nonetheless! I’m constantly looking for a way to express something like Dijkstra’s algorithm algebraicly, using the usual `Alternative`

combinators; I don’t know if this is related.

The other interesting point is that, for this to be an instance of `Applicative`

, it would need some analogue for multiplication for the weights. I’m not sure what that should be.

This is inherently *max*-priority. It’s not obvious how to translate what we have into a min-priority queue version.

Finally, it might be worth trying out different priority queues (a pairing heap is very similar in structure to this). Also, we could rearrange the weights so that larger ones are higher in each tree: this might give a performance boost.

Lampropoulos, Leonidas, Antal Spector-Zabusky, and Kenneth Foner. 2017. “Ode on a random urn (functional pearl).” In, 26–37. ACM Press. doi:10.1145/3122955.3122959. https://www.cis.upenn.edu/~llamp/pdf/urns.pdf.

Okasaki, Chris. 1997. “Three Algorithms on Braun Trees.” *Journal of Functional Programming* 7 (6) (November): 661–666. doi:10.1017/S0956796897002876. https://www.eecs.northwestern.edu/~robby/courses/395-495-2013-fall/three-algorithms-on-braun-trees.pdf.

There’s one extra step I haven’t mentioned: we also must allow the first element (the last inserted) to be chosen, so we run the random-number generator once to check if that’s the element we want to choose.↩

Part 2 of a 3-part series on Balanced Folds

Previously I tried to figure out a way to fold lists in a more balanced way. Usually, when folding lists, you’ve got two choices for your folds, both of which are extremely unbalanced in one direction or another. Jon Fairbairn wrote a more balanced version, which looked something like this:

```
treeFold :: (a -> a -> a) -> a -> [a] -> a
treeFold f = go
where
go x [] = x
go a (b:l) = go (f a b) (pairMap l)
pairMap (x:y:rest) = f x y : pairMap rest
pairMap xs = xs
```

The fold above is kind of magical: for a huge class of algorithms, it kind of “automatically” improves some factor of theirs from $\mathcal{O}(n)$ to $\mathcal{O}(\log n)$. For instance: to sum a list of floats, `foldl' (+) 0`

will have an error growth of $\mathcal{O}(n)$; `treeFold (+) 0`

, though, has an error rate of $\mathcal{O}(\log n)$. Similarly, using the following function to merge two sorted lists:

```
merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge (x:xs) ys = go x xs ys
where
go x xs [] = x : xs
go x xs (y:ys)
| x <= y = x : go y ys xs
| otherwise = y : go x xs ys
```

We get either insertion sort ($\mathcal{O}(n^2)$) or merge sort ($\mathcal{O}(n \log n)$) just depending on which fold you use.

I’ll give some more examples later, but effectively it gives us a better “divide” step in many divide and conquer algorithms.

As it was such a useful fold, and so integral to many tricky algorithms, I really wanted to have it available in Agda. Unfortunately, though, the functions (as defined above) aren’t structurally terminating, and there doesn’t *look* like there’s an obvious way to make it so. I tried to make well founded recursion work, but the proofs were ugly and slow.

However, we can use some structures from a previous post: the nested binary sequence, for instance. It has some extra nice properties: instead of nesting the types, we can just apply the combining function.

```
mutual
data Tree {a} (A : Set a) : Set a where
2^_×_+_ : ℕ → A → Node A → Tree A
data Node {a} (A : Set a) : Set a where
⟨⟩ : Node A
⟨_⟩ : Tree A → Node A
module TreeFold {a} {A : Set a} (_*_ : A → A → A) where
infixr 5 _⊛_ 2^_×_⊛_
2^_×_⊛_ : ℕ → A → Tree A → Tree A
2^ n × x ⊛ 2^ suc m × y + ys = 2^ n × x + ⟨ 2^ m × y + ys ⟩
2^ n × x ⊛ 2^ zero × y + ⟨⟩ = 2^ suc n × (x * y) + ⟨⟩
2^ n × x ⊛ 2^ zero × y + ⟨ ys ⟩ = 2^ suc n × (x * y) ⊛ ys
_⊛_ : A → Tree A → Tree A
_⊛_ = 2^ 0 ×_⊛_
⟦_⟧↓ : Tree A → A
⟦ 2^ _ × x + ⟨⟩ ⟧↓ = x
⟦ 2^ _ × x + ⟨ xs ⟩ ⟧↓ = x * ⟦ xs ⟧↓
⟦_⟧↑ : A → Tree A
⟦ x ⟧↑ = 2^ 0 × x + ⟨⟩
⦅_,_⦆ : A → List A → A
⦅ x , xs ⦆ = ⟦ foldr _⊛_ ⟦ x ⟧↑ xs ⟧↓
```

Alternatively, we can get $\mathcal{O}(1)$ cons with the skew array:

```
infixr 5 _⊛_
_⊛_ : A → Tree A → Tree A
x ⊛ 2^ n × y + ⟨⟩ = 2^ 0 × x + ⟨ 2^ n × y + ⟨⟩ ⟩
x ⊛ 2^ n × y₁ + ⟨ 2^ 0 × y₂ + ys ⟩ = 2^ suc n × (x * (y₁ * y₂)) + ys
x ⊛ 2^ n × y₁ + ⟨ 2^ suc m × y₂ + ys ⟩ = 2^ 0 × x + ⟨ 2^ n × y₁ + ⟨ 2^ m × y₂ + ys ⟩ ⟩
```

Using this, a proper and efficient merge sort is very straightforward:

```
data Total {a r} {A : Set a} (_≤_ : A → A → Set r) (x y : A) : Set (a ⊔ r) where
x≤y : ⦃ _ : x ≤ y ⦄ → Total _≤_ x y
y≤x : ⦃ _ : y ≤ x ⦄ → Total _≤_ x y
module Sorting {a r}
{A : Set a}
{_≤_ : A → A → Set r}
(_≤?_ : ∀ x y → Total _≤_ x y) where
data [∙] : Set a where
⊥ : [∙]
[_] : A → [∙]
data _≥_ (x : A) : [∙] → Set (a ⊔ r) where
instance ⌈_⌉ : ∀ {y} → y ≤ x → x ≥ [ y ]
instance ⌊⊥⌋ : x ≥ ⊥
infixr 5 _∷_
data Ordered (b : [∙]) : Set (a ⊔ r) where
[] : Ordered b
_∷_ : ∀ x → ⦃ x≥b : x ≥ b ⦄ → (xs : Ordered [ x ]) → Ordered b
_∪_ : ∀ {b} → Ordered b → Ordered b → Ordered b
[] ∪ ys = ys
(x ∷ xs) ∪ ys = ⟅ x ∹ xs ∪ ys ⟆
where
⟅_∹_∪_⟆ : ∀ {b} → ∀ x ⦃ _ : x ≥ b ⦄ → Ordered [ x ] → Ordered b → Ordered b
⟅_∪_∹_⟆ : ∀ {b} → Ordered b → ∀ y ⦃ _ : y ≥ b ⦄ → Ordered [ y ] → Ordered b
merge : ∀ {b} x y ⦃ _ : x ≥ b ⦄ ⦃ _ : y ≥ b ⦄
→ Total _≤_ x y
→ Ordered [ x ]
→ Ordered [ y ]
→ Ordered b
⟅ x ∹ xs ∪ [] ⟆ = x ∷ xs
⟅ x ∹ xs ∪ y ∷ ys ⟆ = merge x y (x ≤? y) xs ys
⟅ [] ∪ y ∹ ys ⟆ = y ∷ ys
⟅ x ∷ xs ∪ y ∹ ys ⟆ = merge x y (x ≤? y) xs ys
merge x y x≤y xs ys = x ∷ ⟅ xs ∪ y ∹ ys ⟆
merge x y y≤x xs ys = y ∷ ⟅ x ∹ xs ∪ ys ⟆
open TreeFold
sort : List A → Ordered ⊥
sort = ⦅ _∪_ , [] ⦆ ∘ map (_∷ [])
```

It would be nice if we could verify these optimizated versions of folds. Luckily, by writing them using `foldr`

, we’ve stumbled into well-trodden ground: the *foldr fusion law*. It states that if you have some transformation $f$, and two binary operators $\oplus$ and $\otimes$, then:

This fits right in with the function we used above. $f$ is `⟦_⟧↓`

, $\oplus$ is `_⊛_`

, and $\otimes$ is whatever combining function was passed in. Let’s prove the foldr fusion law, then, before we go any further.

```
module Proofs
{a r}
{A : Set a}
{R : Rel A r}
where
infix 4 _≈_
_≈_ = R
open import Algebra.FunctionProperties _≈_
foldr-universal : Transitive _≈_
→ ∀ {b} {B : Set b} (h : List B → A) f e
→ ∀[ f ⊢ Congruent₁ ]
→ (h [] ≈ e)
→ (∀ x xs → h (x ∷ xs) ≈ f x (h xs))
→ ∀ xs → h xs ≈ foldr f e xs
foldr-universal _○_ h f e f⟨_⟩ ⇒[] ⇒_∷_ [] = ⇒[]
foldr-universal _○_ h f e f⟨_⟩ ⇒[] ⇒_∷_ (x ∷ xs) =
(⇒ x ∷ xs) ○ f⟨ foldr-universal _○_ h f e f⟨_⟩ ⇒[] ⇒_∷_ xs ⟩
foldr-fusion : Transitive _≈_
→ Reflexive _≈_
→ ∀ {b c} {B : Set b} {C : Set c} (f : C → A) {_⊕_ : B → C → C} {_⊗_ : B → A → A} e
→ ∀[ _⊗_ ⊢ Congruent₁ ]
→ (∀ x y → f (x ⊕ y) ≈ x ⊗ f y)
→ ∀ xs → f (foldr _⊕_ e xs) ≈ foldr _⊗_ (f e) xs
foldr-fusion _○_ ∎ h {f} {g} e g⟨_⟩ fuse =
foldr-universal _○_ (h ∘ foldr f e) g (h e) g⟨_⟩ ∎ (λ x xs → fuse x (foldr f e xs))
```

We’re not using the proofs in Agda’s standard library because these are tied to propositional equality. In other words, instead of using an abstract binary relation, they prove things over *actual* equality. That’s all well and good, but as you can see above, we don’t need propositional equality: we don’t even need the relation to be an equivalence, we just need transitivity and reflexivity.

After that, we can state precisely what correspondence the tree fold has, and under what conditions it does the same things as a fold:

```
module _ {_*_ : A → A → A} where
open TreeFold _*_
treeFoldHom : Transitive _≈_
→ Reflexive _≈_
→ Associative _*_
→ RightCongruent _*_
→ ∀ x xs
→ ⦅ x , xs ⦆ ≈ foldr _*_ x xs
treeFoldHom _○_ ∎ assoc *⟨_⟩ b = foldr-fusion _○_ ∎ ⟦_⟧↓ ⟦ b ⟧↑ *⟨_⟩ (⊛-hom zero)
where
⊛-hom : ∀ n x xs → ⟦ 2^ n × x ⊛ xs ⟧↓ ≈ x * ⟦ xs ⟧↓
⊛-hom n x (2^ suc m × y + ⟨⟩ ) = ∎
⊛-hom n x (2^ suc m × y + ⟨ ys ⟩) = ∎
⊛-hom n x (2^ zero × y + ⟨⟩ ) = ∎
⊛-hom n x (2^ zero × y + ⟨ ys ⟩) = ⊛-hom (suc n) (x * y) ys ○ assoc x y ⟦ ys ⟧↓
```

Consider the following implementation of the tree above in Haskell:

```
type Tree a = [(Int,a)]
cons :: (a -> a -> a) -> a -> Tree a -> Tree a
cons (*) = cons' 0
where
cons' n x [] = [(n,x)]
cons' n x ((0,y):ys) = cons' (n+1) (x * y) ys
cons' n x ((m,y):ys) = (n,x) : (m-1,y) : ys
```

The `cons`

function “increments” that list as if it were the bits of a binary number. Now, consider using the `merge`

function from above, in a pattern like this:

What does `f`

build? A list of lists, right?

Kind of. That’s what’s built in terms of the observable, but what’s actually stored in memory us a bunch of thunks. The shape of *those* is what I’m interested in. We can try and see what they look like by using a data structure that doesn’t force on merge:

Using a handy tree-drawing function, we can see what `f [1..13]`

looks like:

```
[(0,*),(1,*),(0,*)]
└1 │ ┌2 │ ┌6
│┌┤ │ ┌┤
││└3 │ │└7
└┤ │┌┤
│┌4 │││┌8
└┤ ││└┤
└5 ││ └9
└┤
│ ┌10
│┌┤
││└11
└┤
│┌12
└┤
└13
```

It’s a binomial heap! It’s a list of trees, each one contains $2^n$ elements. But they’re not in heap order, you say? Well, as a matter of fact, they *are*. It just hasn’t been evaluated yet. Once we force—say—the first element, the rest will shuffle themselves into a tree of thunks.

This illustrates a pretty interesting similarity between binomial heaps and merge sort. Performance-wise, though, there’s another interesting property: the thunks *stay thunked*. In other words, if we do a merge sort via:

We could instead freeze the fold, and look at it at every point:

```
sortPrefixes = map (foldr (merge . snd) []) . scanl (flip (cons merge . pure)) []
>>> [[],[1],[1,4],[1,2,4],[1,2,3,4],[1,2,3,4,5]]
```

And `sortPrefixes`

is only $\mathcal{O}(n^2)$ (rather than $\mathcal{O}(n^2 \log n)$). I confess I don’t know of a use for sorted prefixes, but it should illustrate the general idea: we get a pretty decent batching of operations, with the ability to freeze at any point in time. The other nice property (which I mentioned in the last post) is that any of the tree folds are extremely parallel.

There’s a great article on shuffling in Haskell which provides an $\mathcal{O}(n \log n)$ implementation of a perfect random shuffle. Unfortunately, the Fisher-Yates shuffle isn’t applicable in a pure functional setting, so you have to be a little cleverer.

The first implementation most people jump to (certainly the one I thought of) is to assign everything in the sequence a random number, and then sort according to that number. Perhaps surprisingly, this *isn’t* perfectly random! It’s a little weird, but the example in the article explains it well: basically, for $n$ elements, your random numbers will have $n^n$ possible values, but the output of the sort will have $n!$ possible values. Since they don’t divide into each other evenly, you’re going to have some extra weight on some permutations, and less on others.

Instead, we can generate a random *factoradic* number. A factoradic number is one where the $n$th digit is in base $n$. Because of this, a factoradic number with $n$ digits has $n!$ possible values: exactly what we want.

In the article, the digits of the number are used to pop values from a binary tree. Because the last digit will have $n$ possible values, and the second last $n-1$, and so on, you can keep popping without hitting an empty tree.

This has the correct time complexity—$\mathcal{O}(n \log n)$—but there’s a lot of overhead. Building the tree, then indexing into it, the rebuilding after each pop, etc.

We’d *like* to just sort the list, according to the indices. The problem is that the indices are relative: if you want to `cons`

something onto the list, you have to increment the rest of the indices, as they’ve all shifted right by one.

What we’ll do instead is use the indices as *gaps*. Our merge function looks like the following:

```
merge [] ys = ys
merge xs [] = xs
merge ((x,i):xs) ((y,j):ys)
| i <= j = (x,i) : merge xs ((y,j-i):ys)
| otherwise = (y,j) : merge ((x,i-j-1):xs) ys
```

With that, and the same `cons`

as above, we get a very simple random shuffle algorithm:

```
shuffle xs = map fst
. foldr (merge . snd) []
. foldr f (const []) xs
where
f x xs (i:is) = cons merge [(x,i)] (xs is)
```

The other interesting thing about this algorithm is that it can use Peano numbers with taking too much of a performance hit:

```
merge : ∀ {a} {A : Set a} → List (A × ℕ) → List (A × ℕ) → List (A × ℕ)
merge xs [] = xs
merge {A = A} xs ((y , j) ∷ ys) = go-r xs y j ys
where
go-l : A → ℕ → List (A × ℕ) → List (A × ℕ) → List (A × ℕ)
go-r : List (A × ℕ) → A → ℕ → List (A × ℕ) → List (A × ℕ)
go : ℕ → ℕ → A → ℕ → List (A × ℕ) → A → ℕ → List (A × ℕ) → List (A × ℕ)
go i zero x i′ xs y j′ ys = (y , j′) ∷ go-l x i xs ys
go zero (suc j) x i′ xs y j′ ys = (x , i′) ∷ go-r xs y j ys
go (suc i) (suc j) = go i j
go-l x i xs [] = (x , i) ∷ xs
go-l x i xs ((y , j) ∷ ys) = go i j x i xs y j ys
go-r [] y j ys = (y , j) ∷ ys
go-r ((x , i) ∷ xs) y j ys = go i j x i xs y j ys
shuffle : ∀ {a} {A : Set a} → List A → List ℕ → List A
shuffle {a} {A} xs i = map proj₁ (⦅ [] , zip-inds xs i ⦆)
where
open TreeFold {a} {List (A × ℕ)} merge
zip-inds : List A → List ℕ → List (List (A × ℕ))
zip-inds [] inds = []
zip-inds (x ∷ xs) [] = ((x , 0) ∷ []) ∷ zip-inds xs []
zip-inds (x ∷ xs) (i ∷ inds) = ((x , i) ∷ []) ∷ zip-inds xs inds
```

I don’t know exactly what the complexity of this is, but I *think* it should be better than the usual approach of popping from a vector.

This is just a collection of random thoughts for now, but I intend to work on using these folds to see if there are any other algorithms they can be useful for. In particular, I think I can write a version of Data.List.permutations which benefits from sharing. And I’m interested in using the implicit binomial heap for some search problems.

]]>
Part 5 of a 5-part series on Breadth-First Traversals

Tags: Haskell

Today, I’m going to look at extending the previous breadth-first traversal algorithms to arbitrary graphs (rather than just trees). Graphs with cycles are notoriously cumbersome in functional languages, so this actually proves to be a little trickier than I thought it would be. First, a quick recap.

So far, we have three major ways to traverse a tree in breadth-first order. The first is the simplest, and the fastest:

```
bfe :: Tree a -> [a]
bfe r = f r b []
where
f (Node x xs) fw bw = x : fw (xs : bw)
b [] = []
b qs = foldl (foldr f) b qs []
```

Given a tree like the following:

```
┌4
┌2┤
│ │ ┌8
│ └5┤
│ └9
1┤
│ ┌10
│ ┌6┘
└3┤
└7
```

We get:

It also demonstrates a theme that will run through this post: lists are the only *visible* data structure (other than the tree, of course). However, we are carefully batching the operations on those lists (the `foldl`

is effectively a reverse) so that they have the same complexity as if we had used a queue. In actual fact, when lists are used this way, they *are* queues: “corecursive” ones (Allison 2006; Smith 2009).

The next two functions perform a breadth-first traversal “level-wise”: instead of just returning all the nodes of the tree, we get them delimited by how far they are from the root.

```
lwe :: Tree a -> [[a]]
lwe r = f b r [] []
where
f k (Node x xs) ls qs = k (x : ls) (xs : qs)
b _ [] = []
b k qs = k : foldl (foldl f) b qs [] []
>>> lwe tree
[[1],[2,3],[4,5,6,7],[8,9,10]]
```

The above function is very clearly related to the `bfe`

function: we just add another queue (representing the current level), and work from there.

The third of these functions also does level-wise enumeration, but in a direct style (without continuations).

```
lwe :: Tree a -> [[a]]
lwe r = f r []
where
f (Node x xs) (q:qs) = (x:q) : foldr f qs xs
f (Node x xs) [] = [x] : foldr f [] xs
```

There are more techniques out there than just these three (including the one in Data.Tree), but these are my favorite, and they’re what I’ll be looking at today.

Functional programming in general excels at working with trees and similar data structures. Graphs, though, are trickier. There’s been a lot of recent work in improving the situation (Mokhov 2017), but I’m going to keep it simple today: a graph is just a function.

So the tree from above could be represented as:

As it happens, all of the algorithms that follow will work on graphs represented as rose trees (or represented any way, really).

So let’s fire up our first traversal!

```
bfs :: Graph a -> Graph a
bfs g r = f r b []
where
f x fw bw = x : fw (g x : bw)
b [] = []
b qs = foldl (foldr f) b qs []
>>> bfs graph 1
[1,2,3,4,5,6,7,8,9,10]
```

Unfortunately, this won’t handle cycles properly:

```
graph 1 = [2,3]
graph 2 = [4,5,1]
graph 3 = [6,7]
graph 5 = [8,9]
graph 6 = [10]
graph _ = []
>>> bfs graph 1
[1,2,3,4,5,1,6,7,8,9,2,3,10,4,5,1,6,7,8,9,2,3,10,4,5,1,6,7,8,9,2,3,10,4,5...
```

We need a way to mark off what we’ve already seen. The following isn’t good enough, also:

It will hang without finishing the list. The solution is to mark off nodes as we find them, with some set structure:

```
bfs :: Ord a => Graph a -> Graph a
bfs g ts = f ts b [] Set.empty
where
f x fw bw s
| Set.member x s = fw bw s
| otherwise = x : fw (g x : bw) (Set.insert x s)
b [] _ = []
b qs s = foldl (foldr f) b qs [] s
>>> bfs graph 1
[1,2,3,4,5,6,7,8,9,10]
```

The levelwise algorithm is similar:

```
lws :: Ord a => Graph a -> a -> [[a]]
lws g r = f b r [] [] Set.empty
where
f k x ls qs s
| Set.member x s = k ls qs s
| otherwise = k (x : ls) (g x : qs) (Set.insert x s)
b _ [] _ = []
b k qs s = k : foldl (foldl f) b qs [] [] s
```

The other levelwise algorithm *doesn’t* translate across so easily. To see why, let’s look at the version without cycle detection:

```
lws :: Graph a -> a -> [[a]]
lws g r = f r []
where
f x (q:qs) = (x:q) : foldr f qs (g x)
f x [] = [x] : foldr f [] (g x)
```

The recursive call is being made *depth*-first, not breadth-first. The result, of course, is breadth-first, but that’s only because the recursive call zips as it goes.

Just looking at the fourth line for now:

We want whatever process built up that `q`

to be denied access to `x`

. The following doesn’t work:

As well as being terribly slow, the later computation can diverge when it finds a cycle, and filtering won’t do anything to help that.

The solution is to “tie the knot”. We basically do two passes over the data: one to build up the “seen so far” list, and then another to do the actual search. The trick is to do both of these passes at once, and feed the result back into the demanding computation.

```
lws g r = takeWhile (not.null) (map fst (fix (f r . push)))
where
push xs = ([],Set.empty) : [ ([],seen) | (_,seen) <- xs ]
f x q@((l,s):qs)
| Set.member x s = q
| otherwise = (x:l, Set.insert x s) : foldr f qs (g x)
```

And it works!

I got the idea for this trick from the appendix of Okasaki (2000). There’s something similar in Kiselyov (2002).

Allison, Lloyd. 2006. “Circular Programs and Self-Referential Structures.” *Software: Practice and Experience* 19 (2) (October): 99–109. doi:10.1002/spe.4380190202. http://users.monash.edu/~lloyd/tildeFP/1989SPE/.

Kiselyov, Oleg. 2002. “Pure-functional transformations of cyclic graphs and the Credit Card Transform.” http://okmij.org/ftp/Haskell/AlgorithmsH.html\#ccard-transform.

Mokhov, Andrey. 2017. “Algebraic Graphs with Class (Functional Pearl).” In *Proceedings of the 10th ACM SIGPLAN International Symposium on Haskell*, 2–13. Haskell 2017. New York, NY, USA: ACM. doi:10.1145/3122955.3122956. http://doi.acm.org/10.1145/3122955.3122956.

Okasaki, Chris. 2000. “Breadth-first Numbering: Lessons from a Small Exercise in Algorithm Design.” In *Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming*, 131–136. ICFP ’00. New York, NY, USA: ACM. doi:10.1145/351240.351253. https://www.cs.tufts.edu/~nr/cs257/archive/chris-okasaki/breadth-first.pdf.

Smith, Leon P. 2009. “Lloyd Allison’s Corecursive Queues: Why Continuations Matter.” *The Monad.Reader* 14 (14) (July): 28. https://meldingmonads.files.wordpress.com/2009/06/corecqueues.pdf.

Part 2 of a 2-part series on Prime Sieves

Tags: Agda

Prime numbers in Agda are *slow*. First, they’re Peano-based, so a huge chunk of optimizations we might make in other languages are out of the window. Second, we really often want to *prove* that they’re prime, so the generation code has to carry verification logic with it (I won’t do that today, though). And third, as always in Agda, you have to convince the compiler of termination. With all of that in mind, let’s try and write a (very slow, very basic) prime sieve in Agda.

First, we can make an “array” of numbers that we cross off as we go.

```
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ Fin.suc))
where
cross-off : Fin _ → List (Maybe (Fin _)) → List (Maybe (Fin _))
sieve : List (Maybe (Fin _)) → List (Fin _)
sieve [] = []
sieve (nothing ∷ xs) = sieve xs
sieve (just x ∷ xs) = suc x ∷ sieve (cross-off x xs)
cross-off p fs = foldr f (const []) fs p
where
B = ∀ {i} → Fin i → List (Maybe (Fin (2 + m)))
f : Maybe (Fin (2 + m)) → B → B
f _ xs zero = nothing ∷ xs p
f x xs (suc y) = x ∷ xs y
```

Very simple so far: we run through the list, filtering out the multiples of each prime as we see it. Unfortunately, this won’t pass the termination checker. This recursive call to `sieve`

is the problem:

Agda finds if a function is terminating by checking that at least one argument gets (structurally) smaller on every recursive call. `sieve`

only takes one argument (the input list), so that’s the one that needs to get smaller. In the line above, if we replaced it with the following:

We’d be good to go: `xs`

is definitely smaller than `(just x ∷ xs)`

. `cross-off x xs`

, though? The thing is, `cross-off`

returns a list of the same length that it’s given. But the function call is opaque: Agda can’t automatically see the fact that the length stays the same. Reaching for a proof here is the wrong move, though: you can get all of the same benefit by switching out the list for a length-indexed vector.

```
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ Fin.suc))
where
cross-off : ∀ {n} → Fin _ → Vec (Maybe _) n → Vec (Maybe _) n
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
sieve [] = []
sieve (nothing ∷ xs) = sieve xs
sieve (just x ∷ xs) = suc x ∷ sieve (cross-off x xs)
cross-off p fs = foldr B f (const []) fs p
where
B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n
f : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
f _ xs zero = nothing ∷ xs p
f x xs (suc y) = x ∷ xs y
```

Actually, my explanation above is a little bit of a lie. Often, the way I think about dependently-typed programs has a lot to do with my intuition for “proofs” and so on. But this leads you down the wrong path (and it’s why writing a proof that `cross-off`

returns a list of the same length is the wrong move).

The actual termination checking algorithm is very simple, albeit strict: the argument passed recursively must be *structurally* smaller. That’s it. Basically, the recursive argument has to be contained in one of the arguments passed. It has nothing to do with Agda “seeing” inside the function `cross-off`

or anything like that. What we’ve done above (to make it terminate) is add another argument to the function: the length of the vector. The argument is implicit, but if we were to make it explicit in the recursive call:

We can see that it does indeed get structurally smaller.

A simple improvement we should be able to make is stopping once we hit the square root of the limit. Since we don’t want to be squaring as we go, we’ll use the following identity:

$(n + 1)^2 = n^2 + 2n + 1$

to figure out the square of the next number from the previous. In fact, we’ll just pass in the limit, and reduce it by $2n + 1$ each time, until it reaches zero:

```
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
where
cross-off : ∀ {n} → ℕ → Vec (Maybe _) n → Vec (Maybe _) n
sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
sieve _ zero = List.mapMaybe id ∘ Vec.toList
sieve _ (suc _) [] = []
sieve i (suc l) (nothing ∷ xs) = sieve (suc i) (l ∸ i ∸ i) xs
sieve i (suc l) (just x ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (cross-off i xs)
cross-off p fs = Vec.foldr B f (const []) fs p
where
B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n
f : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
f _ xs zero = nothing ∷ xs p
f x xs (suc y) = x ∷ xs y
```

A slight variation on the code above (the first version) will give us the prime factors of a number:

```
primeFactors : ∀ n → List (Fin n)
primeFactors zero = []
primeFactors (suc zero) = []
primeFactors (suc (suc zero)) = []
primeFactors (suc (suc (suc m))) = sieve (Vec.tabulate (just ∘ Fin.suc))
where
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
sieve [] = []
sieve (nothing ∷ xs) = sieve xs
sieve (just x ∷ xs) = Vec.foldr B remove b xs sieve x
where
B = λ n → ∀ {i}
→ (Vec (Maybe (Fin (2 + m))) n
→ List (Fin (3 + m)))
→ Fin i
→ List (Fin (3 + m))
b : B 0
b k zero = suc x ∷ k []
b k (suc _) = k []
remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
remove y ys k zero = ys (k ∘ (nothing ∷_)) x
remove y ys k (suc j) = ys (k ∘ (y ∷_)) j
```

Adding the squaring optimization complicates things significantly:

```
primeFactors : ∀ n → List (Fin n)
primeFactors zero = []
primeFactors (suc zero) = []
primeFactors (suc (suc zero)) = []
primeFactors (suc (suc (suc m))) = sqr (suc m) m suc sieve
where
_2F-_ : ∀ {n} → ℕ → Fin n → ℕ
x 2F- zero = x
zero 2F- suc y = zero
suc zero 2F- suc y = zero
suc (suc x) 2F- suc y = x 2F- y
sqr : ∀ n
→ ℕ
→ (Fin n → Fin (2 + m))
→ (∀ {i} → Vec (Maybe (Fin (2 + m))) i → ℕ → List (Fin (3 + m)))
→ List (Fin (3 + m))
sqr n zero f k = k [] n
sqr zero (suc l) f k = k [] zero
sqr (suc n) (suc l) f k =
let x = f zero
in sqr n (l 2F- x) (f ∘ suc) (k ∘ (just x ∷_))
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → ℕ → List (Fin (3 + m))
sieve xs′ i = go xs′
where
go : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
go [] = []
go (nothing ∷ xs) = go xs
go (just x ∷ xs) = Vec.foldr B remove (b i) xs x go
where
B = λ n → ∀ {i}
→ Fin i
→ (Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m)))
→ List (Fin (3 + m))
b : ℕ → B 0
b zero zero k = suc x ∷ k []
b zero (suc y) k = k []
b (suc n) zero k = b n x k
b (suc n) (suc y) k = b n y k
remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
remove y ys zero k = ys x (k ∘ (nothing ∷_))
remove y ys (suc j) k = ys j (k ∘ (y ∷_))
```

The above sieve aren’t “true” in that each `remove`

is linear, so the performance is $\mathcal{O}(n^2)$ overall. This is the same problem we ran into with the naive infinite sieve in Haskell.

Since it bears such a similarity to the infinite sieve, we have to ask: can *this* sieve be infinite? Agda supports a notion of infinite data, so it would seem like it:

```
infixr 5 _◂_
record Stream (A : Set) : Set where
constructor _◂_
coinductive
field
head : A
tail : Stream A
open Stream
primes : Stream ℕ
primes = sieve 1 nats
where
nats : Stream ℕ
head nats = 0
tail nats = nats
sieve : ℕ → Stream ℕ → Stream ℕ
head (sieve i xs) = suc i
tail (sieve i xs) = remove i (head xs) (tail xs) (sieve ∘ suc ∘ (_+ i))
where
remove : ℕ → ℕ → Stream ℕ → (ℕ → Stream ℕ → Stream ℕ) → Stream ℕ
remove zero zero zs k = remove i (head zs) (tail zs) (k ∘ suc)
remove zero (suc z) zs k = remove i z zs (k ∘ suc)
remove (suc y) zero zs k = k zero (remove y (head zs) (tail zs) _◂_)
remove (suc y) (suc z) zs k = remove y z zs (k ∘ suc)
```

But this won’t pass the termination checker. What we actually need to prove to do so is that there are infinitely many primes: a nontrivial task in Agda.

]]>One of the favorite pastimes of both Haskell and Agda programmers alike is verifying data structures. Among my favorite examples are Red-Black trees (Might 2015; Weirich 2014, verified for balance), perfect binary trees (Hinze 1999), square matrices (Okasaki 1999a), search trees (McBride 2014, verified for balance and order), and binomial heaps (Hinze 1998, verified for structure).

There are many ways to verify data structures. One technique which has had recent massive success is to convert Haskell code to Coq, and then verify the Coq translation: this was the route taken by Breitner et al. (2018) to verify `Set`

and `IntSet`

in containers (a mammoth achievement, in my opinion).

This approach has some obvious advantages: you separate implementation from testing (which is usually a good idea), and your verification language can be different from your implementation language, with each tailored towards its particular domain.

LiquidHaskell (Bakst et al. 2018) (and other tools like it) adds an extra type system to Haskell tailor-made for verification. The added type system (refinement types) is more automated (the typechecker uses Z3), more suited for “invariant”-like things (it supports subtyping), and has a bunch of domain-specific built-ins (reasoning about sets, equations, etc.). I’d encourage anyone who hasn’t used it to give it a try: especially if you’re experienced writing any kind of proof in a language like Agda or Idris, LiquidHaskell proofs are *shockingly* simple and easy.

What I’m going to focus on today, though, is writing *correct-by-construction* data structures, using Haskell and Agda’s own type systems. In particular, I’m going to look at how to write *fast* verification. In the other two approaches, we don’t really care about the “speed” of the proofs: sure, it’s nice to speed up compilation and so on, but we don’t have to worry about our implementation suffering at runtime because of some complex proof. When writing correct-by-construction code, though, our task is doubly hard: we now have to worry about the time complexity of both the implementation *and the proofs*.

In this post, I’m going to demonstrate some techniques to write proofs that stay within the complexity bounds of the algorithms they’re verifying (without cheating!). Along the way I’m going to verify some data structures I haven’t seen verified before (a skew-binary random-access list).

To demonstrate the first two techniques, we’re going to write a type for modular arithmetic. For a more tactile metaphor, think of the flip clock:

Each digit can be incremented $n$ times, where $n$ is whatever base you’re using (12 for our flip-clock above). Once you hit the limit, it flips the next digit along. We’ll start with just one digit, and then just string them together to get our full type. That in mind, our “digit” type has two requirements:

- It should be incrementable.
- Once it hits its limit, it should flip back to zero, and let us know that a flip was performed.

Anyone who’s used a little Agda or Idris will be familiar with the `Fin`

type:

`Fin n`

is the standard way to encode “numbers smaller than `n`

”. However, for digits they’re entirely unsuitable: since the limit parameter changes on successor, the kind of increment we want is $\mathcal{O}(n)$:

```
try-suc : ∀ {n} → Fin n → Maybe (Fin n)
try-suc (suc x) = Maybe.map suc (try-suc x)
try-suc {suc n} zero with n
... | zero = nothing
... | suc _ = just (suc zero)
suc-flip : ∀ {n} → Fin n → Fin n × Bool
suc-flip {suc n} x = maybe (_, false) (zero , true) (try-suc x)
suc-flip {zero} ()
```

If we keep going down this path with proofs in mind, we might next look at the various $\leq$ proofs in the Agda standard library (here, here, and here), and see if we can we can wrangle them into doing what we want.

For me, though, this wasn’t a fruitful approach. Instead, we’ll try and think of how we’d do this without proving anything, and then see if there’s any place in the resulting data structure we can hang some proof.

So, in an unproven way, let’s start with some numbers. Since we’re going to be incrementing, they’d better be unary:

And then, for the “flippable” type, we’ll just store the limit alongside the value:

We’re not there yet: to check if we’ve gone over the limit, we’ll still have to compare `val`

and `lim`

. Hopefully you can guess the optimization we’ll make: instead of storing the limit, we’ll store the space left:

And we get our flip function:

```
suc-flip : Flipper → Flipper × Bool
suc-flip (zero & n) = (suc n & zero ), true
suc-flip (suc m & n) = (m & suc n), false
```

When there’s no space left, the digit must be maximal (9 in decimal, for instance), so it’ll be one less than the base. That lets us stick it in for the base, rather than recalculating. In the other case, we just take one from the space left, and add it to the value.

So, to “prove” this implementation, we might first reach for an equality proof that `val + space`

is equal to your base. Don’t! Both `val`

and `space`

are inductive structures, which could be giving us information on every application of `suc`

! Let’s set our sights on `val`

and see how we can hang our proofs off of it.

We’re going to upgrade our Peano number with some information, which means that our resulting type is going to look an awful lot like a Peano number. In other words, two cases: `zero`

and `suc`

.

For the `suc-case`

, remember we only want to be allowed to increment it when the space left is more than zero. So let’s encode it:

And for the `zero-case`

, the space left is just the base. So let’s stick the base into the type as well:

```
data Val (base : ℕ) : ℕ → Set where
zero-case : Val base base
suc-case : ∀ {space} → Val base (suc space) → Val base space
```

(We’ve changed around the way “base” works: it’s now one smaller. So to encode base-10 you’d have `Val 9 space`

. You can get back to the other encoding with a simple wrapper, this way just makes things slightly easier from now on).

Finally, our flipper:

```
record Flipper (base : ℕ) : Set where
constructor _&_
field
space : ℕ
val : Val base space
suc-flip : ∀ {n} → Flipper n → Flipper n × Bool
suc-flip (zero & m) = (_ & zero-case) , true
suc-flip (suc n & m) = (n & suc-case m) , false
```

Great! Everything works.

You may have noticed that the `Val`

type is actually a proof for $\geq$ in disguise:

And the flipper itself is just an existential in disguise:

```
Flipper : ℕ → Set
Flipper n = ∃ (n ≥_)
suc-flip : ∀ {n} → Flipper n → Flipper n × Bool
suc-flip (zero , m) = (_ , m≥m ), true
suc-flip (suc n , m) = (n , m≥p m), false
```

Hopefully this explanation will help you understand how to get from the specification to those 8 lines. This technique is going to come in especially handy later when we base data structures off of number systems.

For this next trick, we’ll add an extra operation to the flipper type above: conversion from a natural number. We want to be able to do it in $\mathcal{O}(n)$ time, and we won’t allow ourselves to change the original type definition. Here’s the type we’re aiming for:

We pass in a proof that the natural number we’re converting from is indeed in range (it’s marked irrelevant so we don’t pay for it). Here’s a non-answer:

While this looks fine, it’s actually the *inverse* of what we want. We defined the inductive structure to be indicated by the inequality proof itself. Let’s make the desired output explicit:

```
toNat : ∀ {n m} → n ≥ m → ℕ
toNat m≥m = zero
toNat (m≥p n≥m) = suc (toNat n≥m)
fromNat-≡ : ∀ {n} m
→ .(n≥m : n ≥ m)
→ Σ[ n-m ∈ Flipper n ] toNat (proj₂ n-m) ≡ m
```

And finally we can try an implementation:

In the `???`

there, we want some kind of successor function. The problem is that we would also need to prove that we *can* do a successor call. Except we don’t want to do that: proving that there’s space left is an expensive operation, and one we can avoid with another trick: first, we *assume* that there’s space left.

```
fromNat-≡ zero n≥m = ( _ , m≥m) , refl
fromNat-≡ (suc n) n≥m with fromNat-≡ n (m≥p n≥m)
... | (suc space , n-1), x≡m = (space , m≥p n-1), cong suc x≡m
... | (zero , n-1), refl = ???
```

But what about the second case? Well, we have to prove this impossible. What if it’s an extremely complex, expensive proof? It doesn’t matter! It will never be run! In contrast to proving the “happy path” correct, if we can confine all of the ugly complex cases to the unhappy paths, we can spend as long as we want proving them impossible without having to worry about runtime cost. Here’s the full function.

`fromNat`

implementation ```
fromNat-≡ : ∀ {n} m
→ .(n≥m : n ≥ m)
→ Σ[ n-m ∈ Flipper n ] toNat (proj₂ n-m) ≡ m
fromNat-≡ zero n≥m = ( _ , m≥m) , refl
fromNat-≡ (suc n) n≥m with fromNat-≡ n (m≥p n≥m)
... | (suc space , n-1), x≡m = (space , m≥p n-1), cong suc x≡m
... | (zero , n≥0), refl = Irrel.⊥-elim (contra _ zero n≥0 n≥m)
where
import Data.Nat.Properties as Prop
n≱sk+n : ∀ n k {sk+n} → sk+n ≡ suc k ℕ.+ n → n ≥ sk+n → ⊥
n≱sk+n n k wit (m≥p n≥sk+n) = n≱sk+n n (suc k) (cong suc wit) n≥sk+n
n≱sk+n n k wit m≥m with Prop.+-cancelʳ-≡ 0 (suc k) wit
... | ()
contra : ∀ n m → (n≥m : n ≥ m) → n ≥ suc (m ℕ.+ toNat n≥m) → ⊥
contra n m m≥m n≥st = n≱sk+n n zero (cong suc (Prop.+-identityʳ n)) n≥st
contra n m (m≥p n≥m) n≥st =
contra
n
(suc m)
n≥m
(subst (λ x → n ≥ suc x) (Prop.+-suc m (toNat n≥m)) n≥st)
fromNat : ∀ {n} m → .(n≥m : n ≥ m) → Flipper n
fromNat m n≥m = proj₁ (fromNat-≡ m n≥m)
```

We’re going to switch into Haskell now, and in particular to functional arrays. These are data structures which aren’t real arrays, but they offer you the kind of interface you’d want from an array in a functional setting. You can’t get better than $\mathcal{O}(\log n)$ indexing, unfortunately (Ben-Amram and Galil 1992), but often it’s enough.

The first “functional array” we’re going to be looking at nested binary random-access lists. It has $\mathcal{O}(\log n)$ indexing, as you might expect, and amortized single-threaded $\mathcal{O}(1)$ `cons`

.

It starts out like a binary random-access list (“random-access list” is another name for “functional array”). You can find a full explanation of the structure in your nearest copy of Purely Functional Data Structures (Okasaki 1999b), but briefly: the structure mimics a binary number, in that it’s a list of “bits”. At each set bit, it stores a tree with $2^i$ elements, where $i$ is the position in the list. In this way, every binary number $n$ has an analogous list of “bits” which contains, in total, $n$ elements.

The “nested” part refers to how we’re going to implement the trees. It works a little like this:

You might have to squint at that definition for a second to understand it: instead of storing two trees at the `Node`

constructor (which is what you’d usually do), we store a tree with double the elements. This has two advantages: all of the children have the same number of elements (this tree, for instance, is always some power of 2), and it also cuts down on memory use.

For the binary random-access list, we’ll use the nested encoding of trees to encode the contents of each bit. There’s an implementation of this very thing on Hackage (Komuves and Divianszky 2016), and Okasaki himself wrote something very similar to it (1999a), but we’re going to go a little further than both of those by indexing the type by its size. Here it is:

```
data Bit = O | I
data Seq ns a where
Nil :: Seq '[] a
Even :: Seq xs (a,a) -> Seq (O : xs) a
Odd :: a -> Seq xs (a,a) -> Seq (I : xs) a
```

The operations we’re interested will be `cons`

and `uncons`

: for the indices, they correspond to incrementing and decrementing the numbers, respectively. As such, we’ll need type-level functions for those:

```
type family Inc (ns :: [Bit]) :: [Bit] where
Inc '[] = '[I]
Inc (O : xs) = I : xs
Inc (I : xs) = O : Inc xs
```

And now the `cons`

function:

```
cons :: a -> Seq ns a -> Seq (Inc ns) a
cons x Nil = Odd x Nil
cons x (Even xs) = Odd x xs
cons x (Odd y ys) = Even (cons (x,y) ys)
```

However, we’re going to run into trouble if we try to write `uncons`

:

```
type family Dec (ns :: [Bit]) :: [Bit] where
Dec (I : xs) = O : xs
Dec (O : xs) = I : Dec xs
Dec '[] = ???
uncons :: Seq ns a -> (a, Seq (Dec ns) a)
uncons (Odd x xs) = (x, Even xs)
uncons (Even xs) = case uncons xs of
((x,y),ys) -> (x, Odd y ys)
uncons Nil = ???
```

We *should* be able to write this function without returning a `Maybe`

. Because we statically know the size, we can encode “only nonempty sequences”. The problem is that `Seq [] a`

isn’t the only non-empty sequence: there’s also `Seq [O] a`

and `Seq [O,O] a`

, and so on. Our binary number system is redundant, because it contains trailing zeroes.

We could add some kind of proof into the data structure, but that would (again) be expensive. Instead, we can make the index *itself* correct-by-construction, by choosing a non-redundant representation of binary numbers.

Here’s the trick: instead of having a list of bits, we’re going to have a list of “the distance to the next one”. This eliminates the redundancy, and translates into our data structure like so:

```
data N = Z | S N
data Nest n ns a where
Odd :: a -> (Seq ns (a,a)) -> Nest Z ns a
Even :: (Nest n ns (a,a)) -> Nest (S n) ns a
data Seq ns a where
Nil :: Seq '[] a
Cons :: Nest n ns a -> Seq (n : ns) a
```

Lovely! Crucially for our `uncons`

, we now know that any non-empty list of bits is a non-zero list of bits, so we can type “nonempty sequence” easily:

```
type family Dec (n :: N) (ns :: [N]) = (r :: [N]) | r -> n ns where
Dec (S n) ns = Z : Dec n ns
Dec Z '[] = '[]
Dec Z (n : ns) = S n : ns
uncons :: Seq (n : ns) a -> (a, Seq (Dec n ns) a)
uncons (Cons xs') = go xs'
where
go :: Nest n ns a -> (a, Seq (Dec n ns) a)
go (Odd x Nil) = (x, Nil)
go (Odd x (Cons xs)) = (x, Cons (Even xs))
go (Even xs) = case go xs of ((x,y),ys) -> (x, Cons (Odd y ys))
```

We’re still not done, though: here’s our new type family for incrementing things.

```
type family Inc (ns :: [N]) :: [N] where
Inc '[] = '[Z]
Inc (S n : ns) = Z : n : ns
Inc (Z : ns) = Carry (Inc ns)
type family Carry (ns :: [N]) :: [N] where
Carry '[] = '[]
Carry (n : ns) = S n : ns
```

The `Carry`

there is ugly, and that ugliness carries into the `cons`

function:

```
cons :: a -> Seq ns a -> Seq (Inc ns) a
cons x Nil = Cons (Odd x Nil)
cons x' (Cons xs') = go x' xs'
where
go :: a -> Nest n ns a -> Seq (Inc (n:ns)) a
go x (Even xs) = Cons (Odd x (Cons xs))
go x (Odd y Nil) = Cons (Even (Odd (x,y) Nil))
go x (Odd y (Cons ys)) = carry (go (x,y) ys)
carry :: Seq ns (a,a) -> Seq (Carry ns) a
carry Nil = Nil
carry (Cons xs) = Cons (Even xs)
```

To clean it up, we’re going to use another technique.

You occasionally see people wonder about the usual definition of addition on Peano numbers:

It’s very simple, with only two equations. When someone sees the following error, then:

`couldn't match type n with n + 0`

They might be tempted to add it as an equation to the function:

Similarly, when someone sees the other error commonly found with $+$:

`couldn't match type S n + m with n + S m`

They’ll add that equation in too! In fact, that particular equation will provide a valid definition of $+$:

So why is the first definition of + the one almost always used? Because it *maximizes output information from minimal input*. Take the second implementation above, the one with the zero on the right. In this function, we have to look at the second argument in the second clause: in other words, we don’t get to find out about the output until we’ve looked at both `n`

and `m`

. In the usual definition, if you know the first argument is `suc`

something, you also know the *output* must be `suc`

something.

Similarly with the third implementation: we have to examine the first argument in its *entirety* before we wrap the output in a constructor. Yes, we can of course prove that they’re all equivalent, but remember: proofs are expensive, and we’re looking for speed here. So the first definition of $+$ is our best bet, since it tells us the most without having to prove anything.

Looking back at our definition of `Inc`

, we can actually provide more information a little sooner:

```
type family Inc (ns :: [N]) :: [N] where
Inc '[] = '[Z]
Inc (S n : ns) = Z : n : ns
Inc (Z : ns) = Carry (Inc ns)
```

In all of the outputs, the list is non-empty. We can encode that, by having two different functions for the head and tail of the list:

```
type family IncHead (ns :: [N]) :: N where
IncHead '[] = Z
IncHead (n : ns) = IncHead' n ns
type family IncHead' (n :: N) (ns :: [N]) :: N where
IncHead' (S n) ns = Z
IncHead' Z ns = S (IncHead ns)
type family IncTail (ns :: [N]) :: [N] where
IncTail '[] = '[]
IncTail (n : ns) = IncTail' n ns
type family IncTail' (n :: N) (ns :: [N]) :: [N] where
IncTail' (S n) ns = n : ns
IncTail' Z ns = IncTail ns
type Inc (ns :: [N]) = IncHead ns : IncTail ns
```

This tells the typechecker that we’re not returning an empty sequence right away, so we don’t have to pattern-match to prove it later, giving us a more efficient function.

```
cons :: a -> Seq ns a -> Seq (Inc ns) a
cons x' xs' = Cons (go x' xs')
where
go :: a -> Seq ns a -> Nest (IncHead ns) (IncTail ns) a
go x Nil = Odd x Nil
go x (Cons (Even xs)) = Odd x (Cons xs)
go x (Cons (Odd y ys)) = Even (go (x,y) ys)
```

Briefly after introducing the binary random-access list, Okasaki describes the *skew-binary* random-access list. As well as having the same indexing cost as the type above, it supports $\mathcal{O}(1)$ `cons`

. But wait—didn’t the previous structure have $\mathcal{O}(1)$ `cons`

? Not really. Unfortunately, in a pure functional setting, imperative-style amortization measurements aren’t always valid. Say we perform a `cons`

in the worst case, and it takes $\log n$ time. In an imperative setting, that’s no problem, because all of the rest of the operations are not going to be on the worst-case. In a pure setting, though, the old structure is still sitting around. You can still access it, and you can still get that awful worst-case time.

This is where the skew binary tree comes in. It’s based on the skew binary numbers: these work similarly to binary, but you’re allowed have (at most) a single 2 digit before any ones. This gives you $\mathcal{O}(1)$ incrementing and decrementing, which is what we need here. Let’s get started.

First, our type-level numbers. We’re going to use the sparse encoding as above, but we need some way to encode “you’re only allowed one 2”. The most lightweight way to do it I can think of is by implicitly assuming the second number in the list of gaps is one less than the others. In other words, we encode a 2 with `[n, 0, m]`

. That `0`

means that at position `n`

there’s a 2, not a 1.

The corresponding type families for increment and decrement are clearly $\mathcal{O}(1)$:

```
type family Inc (ns :: [N]) = (ms :: [N]) | ms -> ns where
Inc '[] = Z : '[]
Inc (x : '[]) = Z : x : '[]
Inc (x : Z : xs) = S x : xs
Inc (x1 : S x2 : xs) = Z : x1 : x2 : xs
type family Dec (n :: N) (ns :: [N]) = (ms :: [N]) | ms -> n ns where
Dec (S x) xs = x : Z : xs
Dec Z '[] = '[]
Dec Z (x : '[]) = x : '[]
Dec Z (x1 : x2 : xs) = x1 : S x2 : xs
```

We don’t need to split this into head and tail families as we did before because there’s no recursive call: we know all we’re ever going to know about the output following *any* match on the input.

There’s another problem before we write the implementation: we can’t use the `Nest`

construction that we had before, because then the head would be buried in $\log n$ constructors (or thereabouts). Instead, we’re going to have to use GADTs to encode the “gap” type, alongside the relevant tree. This gap type is going to be very similar to the $\geq$ proof we had for the modular counters, but with an extra parameter:

`Gap n g m`

means there is a gap of `g`

between `n`

and `m`

. Or, stated another way, it means `n + g = m`

. Its inductive structure mimics the `g`

parameter (it’s basically the `g`

parameter itself with some added information).

With all of that together, here’s the definition of the array itself:

```
type family Tree (n :: N) (a :: Type) where
Tree Z a = a
Tree (S n) a = Node n a
data Node n a = Node a (Tree n a) (Tree n a)
data SeqTail (n :: N) (ns :: [N]) (a :: Type) where
NilT :: SeqTail n '[] a
ConsT :: Gap n g m
-> Tree m a
-> SeqTail (S m) ms a
-> SeqTail n (g : ms) a
data Seq (ns :: [N]) (a :: Type) where
Nil :: Seq '[] a
Cons :: Gap Z g n
-> Tree n a
-> SeqTail n ns a
-> Seq (g : ns) a
```

The `cons`

operation again mimics the increment function, but there’s one final snag before it’ll typecheck:

```
cons :: a -> Seq ns a -> Seq (Inc ns) a
cons x Nil = Cons Zy x NilT
cons x (Cons zn y NilT) = Cons Zy x (ConsT zn y NilT)
cons x (Cons zn y1 (ConsT Zy y2 ys)) = Cons(Sy zn) (Node x y1 y2) ys
cons x (Cons zn y1 (ConsT (Sy nm) y2 ys)) =
Cons Zy x (ConsT zn y1 (ConsT ??? y2 ys))
```

On the final line, the `???`

is missing. In the unverified version, `nm`

would slot right in there. Here, though, if we try it we get an error, which basically amounts to:

At this point, I’d usually throw out the inductive-style proof, and replace it with a proof of equality, which I’d aggressively erase in all of the functions. I said at the beginning I wouldn’t cheat, though, so here’s what I’ll do instead:

```
gapr :: Gap n g m -> Gap (S n) g (S m)
gapr Zy = Zy
gapr (Sy pnm) = Sy (gapr pnm)
cons :: a -> Seq ns a -> Seq (Inc ns) a
cons x Nil = Cons Zy x NilT
cons x (Cons zn y NilT) = Cons Zy x (ConsT zn y NilT)
cons x (Cons zn y1 (ConsT Zy y2 ys)) = Cons (Sy zn) (Node x y1 y2) ys
cons x (Cons zn y1 (ConsT (Sy nm) y2 ys)) =
Cons Zy x (ConsT zn y1 (ConsT (gapr nm) y2 ys))
```

At first glance, we’ve lost the complexity bounds. That `gapr`

operation is $\log n$ (or something), and we’re performing it pretty frequently. We might keep the amortized bounds, but isn’t that not really worthy in a pure setting?

That would all be true, if it weren’t for laziness. Because we *delay* the evaluation of `gapr`

, we won’t have to pay for it all in one big thunk. In fact, because it’s basically a unary number, we only have to pay for one part of it at a time. I haven’t yet fully worked out the proofs, but I’m pretty sure we’re guaranteed $\mathcal{O}(1)$ worst-case time here too.

About a year ago, I tried to write a verified version of binomial heaps, which could then be used for sorting traversable containers. Unfortunately, I couldn’t figure out how to write delete-min, and gave up. I *did* recognize that the redundancy of the binary representation was a problem, but I couldn’t figure out much more than that.

Now, though, we have a new non-redundant representation of binary numbers, and some handy techniques to go along with it.

Unfortunately, I ran into a similar roadblock in the implementation. Here’s the point where I was stuck:

```
data Zipper a n xs = Zipper a (Node n a) (Binomial n xs a)
slideLeft :: Zipper a (S n) xs -> Zipper a n (Z : xs)
slideLeft (Zipper m (t :< ts) hs) = Zipper m ts (Cons (Odd t hs))
minView :: Ord a => Binomial n (x : xs) a -> (a, Binomial n (Decr x xs) a)
minView (Cons xs') = unZipper (go xs')
where
unZipper (Zipper x _ xs) = (x, xs)
go :: forall a n x xs. Ord a => Nest n x xs a -> Zipper a n (Decr x xs)
go (Even xs) = slideLeft (go xs)
go (Odd (Root x ts) Empty) = Zipper x ts Empty
go (Odd c@(Root x ts) (Cons xs)) =
case go xs of
(Zipper m (t' :< _) hs)
| m >= x -> Zipper x ts (Cons (Even xs))
| otherwise ->
Zipper m ts
(case hs of
Empty -> Cons (Even (Odd (mergeTree c t') Empty))
Cons hs' -> Cons (Even (carryOneNest (mergeTree c t') hs')))
```

The last two lines don’t typecheck! The errors were complex, but effectively they stated:

`Could not deduce`

`x : xs ~ [Z]`

`from the context`

`Decr x xs ~ []`

and:

`Could not deduce`

`x : xs ~ Inc (y : ys)`

`from the context`

`Decr x xs ~ y : ys`

The thing is, all of those look pretty provable. So, for this technique, we first figure out what proofs we need, and *assume* we have them. This means changing `minView`

to the following:

```
data Zipper a n xs = Zipper a (Node n a) (Binomial n xs a)
slideLeft :: Zipper a (S n) xs -> Zipper a n (Z : xs)
slideLeft (Zipper m (t :< ts) hs) = Zipper m ts (Cons (Odd t hs))
minView :: Ord a => Binomial n (x : xs) a -> (a, Binomial n (Decr x xs) a)
minView (Cons xs') = unZipper (go xs')
where
unZipper (Zipper x _ xs) = (x, xs)
go :: forall a n x xs. Ord a => Nest n x xs a -> Zipper a n (Decr x xs)
go (Even xs) = slideLeft (go xs)
go (Odd (Root x ts) Empty) = Zipper x ts Empty
go (Odd c@(Root x ts) (Cons xs)) =
case go xs of
(Zipper m (t' :< _) (hs :: Binomial (S n) (Decr y ys) a))
| m >= x -> Zipper x ts (Cons (Even xs))
| otherwise ->
Zipper m ts
(case hs of
Empty -> gcastWith (lemma1 @y @ys Refl)
Cons (Even (Odd (mergeTree c t') Empty))
Cons hs' -> gcastWith (lemma2 @y @ys Refl)
Cons (Even (carryOneNest (mergeTree c t') hs')))
```

And writing in the templates for our lemmas:

```
lemma1 :: forall x xs. Decr x xs :~: '[] -> x : xs :~: Z : '[]
lemma1 = _
lemma2 :: forall x xs y ys. Decr x xs :~: y : ys -> x : xs :~: Inc (y : ys)
lemma2 = _
```

We now need to provide the *implementations* for `lemma1`

and `lemma2`

. With this approach, even if we fail to do the next steps, we can cop out here and sub in `unsafeCoerce Refl`

in place of the two proofs, maintaining the efficiency. We won’t need to, though!

Unlike in Agda, the types for those proofs won’t be around at runtime, so we won’t have anything to pattern match on. We’ll need to look for things in the surrounding area which could act like singletons for the lemmas.

It turns out that the `xs`

and `hs'`

floating around can do exactly that: they tell us about the type-level `y`

and `x`

. So we just pass them to the lemmas (where they’re needed). This changes the last 4 lines of `minView`

to:

```
Empty -> gcastWith (lemma1 Refl xs)
Cons (Even (Odd (mergeTree c t') Empty))
Cons hs' -> gcastWith (lemma2 Refl xs hs')
Cons (Even (carryOneNest (mergeTree c t') hs'))
```

Now, we just have to fill in the lemmas! If we were lucky, they’d actually be constant-time.

```
lemma1 :: forall x xs n a. Decr x xs :~: '[]
-> Nest n x xs a
-> x : xs :~: Z : '[]
lemma1 Refl (Odd _ Empty) = Refl
lemma2 :: forall x xs y ys n a.
Decr x xs :~: y : ys
-> Nest n x xs a
-> Nest n y ys a
-> x : xs :~: Inc (y : ys)
lemma2 Refl (Even (Odd _ Empty)) (Odd _ Empty) = Refl
lemma2 Refl (Odd _ (Cons _)) (Even _) = Refl
lemma2 Refl (Even xs) (Odd _ (Cons ys)) =
gcastWith (lemma2 Refl xs ys) Refl
```

If they *had* been constant-time, that would have let us throw them out: each proof would essentially show you what cases needed to be scrutinized to satisfy the typechecker. You then just scrutinize those cases in the actual function, and it should all typecheck.

As it is, `lemma2`

is actually ok. It does cost $\mathcal{O}(\log n)$, but so does `carryOneNest`

: we’ve maintained the complexity! We *could* stop here, satisfied.

There’s another option, though, one that I picked up from Stephanie Weirich’s talk (2017): you thread the requirement through the function as an equality constraint. It won’t always work, but when your function’s call graph matches that of the proof, the constraint will indeed be satisfied, with no runtime cost. In this case, we can whittle down the proof obligation to the following:

Now we change the recursive `go`

into continuation-passing style, and add that constraint to its signature, and everything works!

```
minView :: Ord a => Binomial n (x : xs) a -> (a, Binomial n (Decr x xs) a)
minView (Cons xs') = go xs' \(Zipper x _ xs) -> (x,xs)
where
go :: Ord a
=> Nest n x xs a
-> (Inc (Decr x xs) ~ (x : xs) => Zipper a n (Decr x xs) -> b) -> b
go (Even xs) k = go xs \(Zipper m (t :< ts) hs) -> k (Zipper m ts (Cons (Odd t hs)))
go (Odd (Root x ts) Empty) k = k (Zipper x ts Empty)
go (Odd c@(Root x cs) (Cons xs)) k =
go xs
\case
Zipper m _ _ | m >= x ->
k (Zipper x cs (Cons (Even xs)))
Zipper m (t :< ts) Empty ->
k (Zipper m ts (Cons (Even (Odd (mergeTree c t) Empty))))
Zipper m (t :< ts) (Cons hs) ->
k (Zipper m ts (Cons (Even (carryOneNest (mergeTree c t) hs))))
```

As I mentioned in the beginning, a huge amount of this stuff is *much* easier using other systems. On top of that, there’s currently a lot of work being done on dependent type erasure, so that proofs like the above don’t even exist at runtime. In other words, there’s a chance that all of these techniques will soon be useless!

Efficient proof-carrying code makes for an interesting puzzle, though, even if it is a bit of a hair shirt.

Fuller implementations of the structures here are in this git repository.

Bakst, Alexander, Ranjit Jhala, Ming Kawaguchi, Patrick Rondon, Eric Seidel, Michael Smith, Anish Tondwalkar, Chris Tetreault, and Niki Vazou. 2018. “LiquidHaskell: Liquid Types For Haskell.” ucsd-progsys. https://github.com/ucsd-progsys/liquidhaskell.

Ben-Amram, Amir M., and Zvi Galil. 1992. “On Pointers Versus Addresses.” *J. ACM* 39 (3) (July): 617–648. doi:10.1145/146637.146666. http://doi.acm.org/10.1145/146637.146666.

Breitner, Joachim, Antal Spector-Zabusky, Yao Li, Christine Rizkallah, John Wiegley, and Stephanie Weirich. 2018. “Ready, Set, Verify! Applying Hs-to-coq to Real-world Haskell Code (Experience Report).” *Proc. ACM Program. Lang.* 2 (ICFP) (July): 89:1–89:16. doi:10.1145/3236784. http://doi.acm.org/10.1145/3236784.

Hinze, Ralf. 1998. *Numerical Representations as Higher-Order Nested Datatypes*. Institut für Informatik III, Universität Bonn. http://www.cs.ox.ac.uk/ralf.hinze/publications/\#R5.

———. 1999. *Perfect Trees and Bit-reversal Permutations*.

Komuves, Balazs, and Peter Divianszky. 2016. “Nested-sequence: List-like data structures with O(Log(n)) random access.” http://hackage.haskell.org/package/nested-sequence.

McBride, Conor Thomas. 2014. “How to Keep Your Neighbours in Order.” In *Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming*, 297–309. ICFP ’14. New York, NY, USA: ACM. doi:10.1145/2628136.2628163. https://personal.cis.strath.ac.uk/conor.mcbride/pub/Pivotal.pdf.

Might, Matthew. 2015. “Missing method: How to delete from Okasaki’s red-black trees.” *matt.might.net*. http://matt.might.net/articles/red-black-delete/.

Okasaki, Chris. 1999a. “From Fast Exponentiation to Square Matrices: An Adventure in Types.” In *Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’99), Paris, France, September 27-29, 1999*, 34:28. ACM. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.357\&rep=rep1\&type=pdf.

———. 1999b. *Purely Functional Data Structures*. Cambridge University Press.

Weirich, Stephanie. 2014. “Depending on Types.” In *Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming*, 241–241. ICFP ’14. New York, NY, USA: ACM. doi:10.1145/2628136.2631168. https://www.cis.upenn.edu/~sweirich/talks/icfp14.pdf.

———. 2017. “Dependent Types in Haskell.” St. Louis, MO, USA. https://www.youtube.com/watch?v=wNa3MMbhwS4.

Part 1 of a 2-part series on Prime Sieves

Tags: Haskell

A few days ago, the Computerphile YouTube channel put up a video about infinite lists in Haskell (Haran 2018). It’s pretty basic, but finishes up with a definition of an infinite list of prime numbers. The definition was something like this:

Unfortunately, and contrary to what’s stated in the video, this *isn’t* a genuine sieve of Eratosthenes. For one thing, it uses trial division where the sieve will only use addition and multiplication, and for another, every single new number is tested against every previous prime (until it finds a divisor).

While obviously this isn’t super important (no-one is using this algorithm to generate primes efficiently (I hope)), it still pops up from time to time on the internet as an example of “Haskell programmers overselling their language”. What’s more, people will then point to the famous *genuine* sieve of Eratosthenes in Haskell (O’Neill 2009), and say something to the effect of “Haskell is so difficult a simple prime sieve took a journal paper to implement properly”.

I think this is mainly silly: the example implementation is meant to showcase how well Haskell does recursion, list processing, and lazy evaluation. What’s more, the “genuine” prime sieve is difficult not because we’re working in Haskell, but because we’re working to generate *infinite primes*. The “simple” implementation of a prime sieve in something like Python will assume a fixed upper bound, and work with finite arrays and so on. Figuring out an efficient implementation for an infinite prime sieve there is even *more* difficult than it was in Haskell.

Anyway, I’m going to try today to show a very simple prime sieve that (hopefully) rivals the simplicity of the definition above.

Visualizations of the sieve of Eratosthenes often rely on metaphors of “crossing out” on some large table. Once you hit a prime, you cross off all of its multiples in the rest of the table, and then you move to the next crossed-off number.

Working with a finite array, it should be easy to see that this is extremely efficient. You’re crossing off every non-prime exactly once, only using addition and squaring.

To extend it to infinite lists, we will use the following function:

```
[] \\ ys = []
xs \\ [] = xs
(x:xs) \\ (y:ys) = case compare x y of
LT -> x : xs \\ (y:ys)
EQ -> xs \\ ys
GT -> (x:xs) \\ ys
```

We’re “subtracting” the right list from the left. Crucially, it works with infinite lists:

Finally, it only works if both lists are ordered and don’t contain duplicates, but our sieve does indeed satisfy that requirement. Using this, we’ve already got a sieve:

No division, just addition and squaring, as promised. Unfortunately, though, this doesn’t have the time complexity we want. See, in the `(\\)`

operation, we have to test every entry in the sieve against the prime factor: when we’re crossing off from an array, we just jump to the next composite number.

The way we speed up the “crossing-off” section of the algorithms is by using a priority queue: this was the optimization provided in O’Neill (2009). Before we go any further, then, let’s put one together:

```
infixr 5 :-
data Queue a b = Queue
{ minKey :: !a
, minVal :: b
, rest :: List a b
}
data List a b
= Nil
| (:-) {-# UNPACK #-} !(Queue a b)
(List a b)
(<+>) :: Ord a => Queue a b -> Queue a b -> Queue a b
(<+>) q1@(Queue x1 y1 ts1) q2@(Queue x2 y2 ts2)
| x1 <= x2 = Queue x1 y1 (q2 :- ts1)
| otherwise = Queue x2 y2 (q1 :- ts2)
mergeQs :: Ord a => List a b -> Queue a b
mergeQs (t :- Nil) = t
mergeQs (t1 :- t2 :- Nil) = t1 <+> t2
mergeQs (t1 :- t2 :- ts) = (t1 <+> t2) <+> mergeQs ts
mergeQs Nil = errorWithoutStackTrace "tried to merge empty list"
insert :: Ord a => a -> b -> Queue a b -> Queue a b
insert !k !v = (<+>) (singleton k v)
singleton :: a -> b -> Queue a b
singleton !k !v = Queue k v Nil
```

These are pairing heaps: I’m using them here because they’re relatively simple and very fast. A lot of their speed comes from the fact that the top-level constructor (`Queue`

) is *non-empty*. Since, in this algorithm, we’re only actually going to be working with non-empty queues, this saves us a pattern match on pretty much every function. They’re also what’s used in Data.Sequence for sorting.

With that, we can write our proper sieve:

```
insertPrime x xs = insert (x*x) (map (*x) xs)
adjust x q@(Queue y (z:zs) qs)
| y <= x = adjust x (insert z zs (mergeQs qs))
| otherwise = q
sieve (x:xs) = x : sieve' xs (singleton (x*x) (map (*x) xs))
where
sieve' (x:xs) table
| minKey table <= x = sieve' xs (adjust x table)
| otherwise = x : sieve' xs (insertPrime x xs table)
primes = 2 : sieve [3,5..]
```

The priority queue stores lists alongside their keys: what you might notice is that those lists are simply sequences of the type $[x, 2x, 3x, 4x...]$ and so on. Rather than storing the whole list, we can instead store just the head and the step. This also simplifies (and greatly speeds up) the expensive `map (*x)`

operation to just *two* multiplications. If you wanted, you could just sub in this representation of streams for all the lists above:

```
data Stepper a = Stepper { start :: a, step :: a }
nextStep :: Num a => Stepper a -> (a, Stepper a)
nextStep (Stepper x y) = (x, Stepper (x+y) y)
pattern x :- xs <- (nextStep -> (x,xs))
(^*) :: Num a => Stepper a -> a -> Stepper a
Stepper x y ^* f = Stepper (x * f) (y * f)
```

If you were so inclined, you could even make it conform to `Foldable`

:

```
data Stepper a where
Stepper :: Num a => a -> a -> Stepper a
nextStep (Stepper x y) = (x, Stepper (x+y) y)
pattern x :- xs <- (nextStep -> (x,xs))
instance Foldable Stepper where
foldr f b (x :- xs) = f x (foldr f b xs)
```

But that’s overkill for what we need here.

Second observation is that if we remove the wheel (from 2), the “start” is simply the *key* in the priority queue, again cutting down on space.

Finally, we get the implementation:

```
primes = 2 : sieve 3 (singleton 4 2)
where
adjust x q@(Queue y z qs)
| x < y = q
| otherwise = adjust x (insert (y + z) z (mergeQs qs))
sieve x q
| x < minKey q = x : sieve (x + 1) (insert (x * x) x q)
| otherwise = sieve (x + 1) (adjust x q)
```

8 lines for a lazy prime sieve isn’t bad!

I haven’t tried a huge amount to optimize the function, but it might be worth looking in to how to add back the wheels. I noticed that for no wheels, the queue contains only two elements per key; for one (the 2 wheel), we needed 3. I wonder if this pattern continues: possibly we could represent wheels as finite lists at each key in the queue. Maybe in a later post.

Haran, Brady. 2018. “To Infinity & Beyond - Computerphile.” https://www.youtube.com/watch?v=bnRNiE_OVWA\&feature=youtu.be.

O’Neill, Melissa E. 2009. “The Genuine Sieve of Eratosthenes.” *Journal of Functional Programming* 19 (01) (January): 95. doi:10.1017/S0956796808007004. https://pdfs.semanticscholar.org/b7d6/7986e54f852de25e2d803472f31fb53184d5.pdf.

Part 1 of a 1-part series on Total Combinatorics

Here’s a quick puzzle: from a finite alphabet, produce an infinite list of infinite strings, each of them unique.

It’s not a super hard problem, but here are some examples of what you might get. Given the alphabet of `0`

and `1`

, for instance, you could produce the following:

```
0000000...
1000000...
0100000...
1100000...
0010000...
1010000...
0110000...
1110000...
0001000...
```

In other words, the enumeration of the binary numbers (least-significant-digit first). We’ll just deal with bits first:

```
data Bit = O | I
instance Show Bit where
showsPrec _ O = (:) '0'
showsPrec _ I = (:) '1'
showList xs s = foldr f s xs
where
f O a = '0' : a
f I a = '1' : a
```

Thinking recursively, we can see that the tail of each list is actually the original sequence, doubled-up:

` 0000000... `

1000000...

0100000...

1100000...

0010000...

1010000...

0110000...

1110000...

0001000...

As it happens, we get something like this pattern with the monad instance for lists *anyway*:

Well, actually it’s the wrong way around. We want to loop through the *first* list the quickest, incrementing the second slower. No worries, we can just use a flipped version of `<*>`

:

```
infixl 4 <<>
(<<>) :: Applicative f => f (a -> b) -> f a -> f b
fs <<> xs = flip ($) <$> xs <*> fs
>>> (,) <$> [O,I] <<> "abc"
[(0,'a'),(1,'a'),(0,'b'),(1,'b'),(0,'c'),(1,'c')]
```

Brilliant! So we can write our function now, yes?

Nope! That won’t ever produce an answer, unfortunately.

The issue with our definition above is that it’s not lazy enough: it demands information that it hasn’t produced yet, so it gets caught in an infinite loop before it can do anything!

We need to kick-start it a little, so it can produce output *before* it asks itself for more. Because we know what the first line is going to be, we can just tell it that:

```
bins = (:) <$> [O,I] <<> (repeat O : tail bins)
>>> mapM_ print (take 8 (map (take 3) bins))
000
100
010
110
001
101
011
111
```

The property that this function has that the previous didn’t is *productivity*: the dual of termination. See, we want to avoid a *kind* of infinite loops in `bins`

, but we don’t want to avoid infinite things altogether: the list it produces is meant to be infinite, for goodness’ sake. Instead, what it needs to do is produce every new value in *finite* time.

In total languages, like Agda, termination checking is a must. To express computation like that above, though, you often also want a *productivity* checker. Agda can do that, too.

Let’s get started then. First, a stream:

```
infixr 5 _◂_
record Stream {a} (A : Set a) : Set a where
coinductive
constructor _◂_
field
head : A
tail : Stream A
open Stream
```

In Haskell, there was no need to define a separate stream type: the type of lists contains both finite and infinite lists.

Agda can get a little more specific: here, we’ve used the `coinductive`

keyword, which means we’re free to create infinite `Stream`

s. Rather than the usual termination checking (which would kick in when we consume a recursive, inductive type), we now get productivity checking: when creating a `Stream`

, the `tail`

must always be available in finite time. For a finite type, we’d have used the `inductive`

keyword instead; this wouldn’t be much use, though, since there’s no way to create a finite `Stream`

without a nil constructor!^{1}

One of the interesting things about working with infinite data (when you’re forced to notice that it’s infinite, as you are in Agda) is that *everything* gets flipped. So you have to prove productivity, not totality; you use product types, rather than sums; and to define functions, you use *co*patterns, rather than patterns.

Copatterns are a handy syntactic construct for writing functions about record types. Let’s start with an example, and then I’ll try explain a little:

Here, we’re defining `pure`

on streams: `pure x`

produces an infinite stream of `x`

. Its equivalent would be repeat in Haskell:

Except instead of describing what it *is*, you describe how it *acts* (it’s kind of an intensional vs. extensional thing). In other words, if you want to make a stream `xs`

, you have to answer the questions “what’s the head of `xs`

?” and “what’s the tail of `xs`

?”

Contrast this with pattern-matching: we’re producing (rather than consuming) a value, and in pattern matching, you have to answer a question for each *case*. If you want to consume a list `xs`

, you have to answer the questions “what do you do when it’s nil?” and “what do you do when it’s cons?”

Anyway, I think the symmetry is kind of cool. Let’s get back to writing our functions.

Unfortunately, we don’t have enough to prove productivity yet. As an explanation why, let’s first try produce the famous `fibs`

list. Written here in Haskell:

Instead of `zipWith`

, let’s define `<*>`

. That will let us use idiom brackets.

```
_<*>_ : ∀ {a b} {A : Set a} {B : Set b}
→ Stream (A → B)
→ Stream A
→ Stream B
head (fs <*> xs) = head fs (head xs)
tail (fs <*> xs) = tail fs <*> tail xs
```

And here’s `fibs`

:

But it doesn’t pass the productivity checker! Because we use a higher-order function (`<*>`

), Agda won’t look at how much it dips into the infinite supply of values. This is a problem: we need it to know that `<*>`

only needs the heads of its arguments to produce a head, and so on. The solution? Encode this information in the types.

```
infixr 5 _◂_
record Stream {i : Size} {a} (A : Set a) : Set a where
coinductive
constructor _◂_
field
head : A
tail : ∀ {j : Size< i} → Stream {j} A
open Stream
```

Now, `Stream`

has an implicit *size* parameter. Basically, `Stream {i} A`

can produce `i`

more values. So `cons`

, then, gives a stream one extra value to produce:

```
cons : ∀ {i a} {A : Set a} → A → Stream {i} A → Stream {↑ i} A
head (cons x xs) = x
tail (cons x xs) = xs
```

Conversely, we can write a different definition of `tail`

that consumes one value^{2}:

For `<*>`

, we want to show that its result can produce just as much values as its inputs can:

```
_<*>_ : ∀ {i a b} {A : Set a} {B : Set b}
→ Stream {i} (A → B)
→ Stream {i} A
→ Stream {i} B
head (fs <*> xs) = head fs (head xs)
tail (fs <*> xs) = tail fs <*> tail xs
```

How does this help the termination/productivity checker? Well, for terminating functions, we have to keep giving the `tail`

field smaller and smaller sizes, meaning that we’ll eventually hit zero (and terminate). For productivity, we now have a way to talk about “definedness” in types, so we can make sure that a recursive call doesn’t dip into a supply it hasn’t produced yet.

One more thing: `Size`

types have strange typing rules, mainly for ergonomic purposes (this is why we’re not just using an `ℕ`

parameter). One of them is that if you don’t specify the size, it’s defaulted to `∞`

, so functions written without size annotations don’t have to be changed with this new definition:

Finally `fibs`

:

```
fibs : ∀ {i} → Stream {i} ℕ
head fibs = 0
head (tail fibs) = 1
tail (tail fibs) = ⦇ fibs + tail fibs ⦈
```

Before I show the Agda solution, I’d like to point out some bugs that were revealed in the Haskell version by trying to implement it totally. First of all, the function signature. “Takes an alphabet and produces unique strings” seems like this:

But what should you produce in this case:

So it must be a non-empty list, giving us the following type and definition:

```
strings :: NonEmpty a -> [[a]]
strings (x :| xs) = (:) <$> (x:xs) <<> (repeat x : tail (strings (x :| xs)))
```

But this has a bug too! What happens if we pass in the following:

So this fails the specification: there is only one unique infinite string from that alphabet (`pure x`

). Interestingly, though, our implementation above also won’t produce any output beyond the first element. I suppose, in a way, these things cancel each other out: our function does indeed produce all of the unique strings, it’s just a pity that it goes into an infinite loop to do so!

Finally, we have our function:

```
strings : ∀ {i a} {A : Set a} → A × A × List A → Stream {i} (Stream A)
head (strings (x , _ , _)) = pure x
tail (strings {A = A} xs@(x₁ , x₂ , xt)) = go x₂ xt (strings xs)
where
go : ∀ {i} → A → List A → Stream {i} (Stream A) → Stream {i} (Stream A)
head (head (go y ys zs)) = y
tail (head (go y ys zs)) = head zs
tail (go _ [] zs) = go x₁ (x₂ ∷ xt) (tail zs)
tail (go _ (y ∷ ys) zs) = go y ys zs
```

As you can see, we do need to kick-start it without a recursive call (the first line is `pure x`

). Then, `go`

takes as a third argument the “tails” argument, and does the kind of backwards Cartesian product we want. However, since we’re into the second element of the stream now, we want to avoid repeating what we already said, which is why we have to give `go`

`x₂`

, rather than `x₁`

. This is what forces us to take at least two elements, rather than at least one, also: we can’t just take the tail of the call to `go`

(this is what we did in the Haskell version of `strings`

with the `NonEmpty`

list), as the recursive call to strings then doesn’t decrease in size:

```
strings : ∀ {i a} {A : Set a} → A × List A → Stream {i} (Stream A)
head (strings (x , _)) = pure x
tail (strings {A = A} xs@(x , xt)) = tail (go x xt (strings xs))
where
go : ∀ {i} → A → List A → Stream {i} (Stream A) → Stream {i} (Stream A)
head (head (go y ys zs)) = y
tail (head (go y ys zs)) = head zs
tail (go _ [] zs) = go x xt (tail zs)
tail (go _ (y ∷ ys) zs) = go y ys zs
```

Agda will warn about termination on this function. Now, if you slap a pragma on it, it *will* produce the correct results for enough arguments, but give it one and you’ll get an infinite loop, just as you were warned!

I’m having a lot of fun with copatterns for various algorithms (especially combinatorics). I’m planning on working on two particular tasks with them for the next posts in this series:

- Proving
`strings`

I’d like to prove that

`strings`

does indeed produce a stream of unique values. Following from that, it would be cool to do a Cantor diagonalisation on its output.- Permutations
Haskell’s permutations implementation in Data.List does some interesting tricks to make it as lazy as possible. It would be great to write an implementation that is verified to be as lazy as possible: the pattern of “definedness” is complex, though, so I don’t know if it’s possible with Agda’s current sized types.

Thanks to gelisam for pointing out the poor phrasing here. Updated on 2018/10/16↩

You might wonder why the definition of

`tail`

doesn’t have this signature to begin with. The reason is that our record type must be*parameterized*(not indexed) over its size (as it’s a record type), so we use a less-than proof instead.↩

Tags: Agda

I’m in the middle of quite a large Agda project at the moment, and I’ve picked up a few tips and tricks in the past few weeks. I’d imagine a lot of these are quite obvious once you get to grips with Agda, so I’m writing them down before I forget that they were once confusing stumbling blocks. Hopefully this helps other people trying to learn the language!

Agda lets you parameterize modules, just as you can datatypes, with types, values, etc. It’s extremely handy for those situations where you want to be generic over some type, but that type won’t change inside the generic code. The keys to dictionaries is a good example: you can start the module with:

And now, where in Haskell you’d have to write something like `Ord a => Map a`

… in pretty much any function signature, you can just refer to `Key`

, and you’re good to go. It’s kind of like a dynamic type synonym, in that way.

Here’s the strangeness, though: what if you don’t supply one of the arguments?

This won’t give you a type error, strange as it may seem. This will perform *lambda lifting*, meaning that now, every function exported by the module will have the type signature:

Preceding its normal signature. In other words, it changes it into what you would have had to write in Haskell.

This is a powerful feature, but it can also give you some confusing errors if you don’t know about it (especially if the module has implicit arguments).

If you’ve got a hole in your program, you can put the cursor in it and press `SPC-m-a`

(in spacemacs), and Agda will try and find the automatic solution to the problem. For a while, I didn’t think much of this feature, as rare was the program which Agda could figure out. Turns out I was just using it wrong! Into the hole you should type the options for the proof search: enabling case-splitting (`-c`

), enabling the use of available definitions (`-r`

), and listing possible solutions (`-l`

).

Often, a program will not be obviously terminating (according to Agda’s termination checker). The first piece of advice is this: *don’t* use well-founded recursion. It’s a huge hammer, and often you can get away with fiddling with the function (try inlining definitions, rewriting generic functions to monomorphic versions, or replacing with-blocks with helper functions), or using one of the more lightweight techniques out there.

However, sometimes it really is the best option, so you have to grit your teeth and use it. What I expected (and what I used originally) was a recursion combinator, with a type something like:

So we’re trying to generate a function of type `A → B`

, but there’s a hairy recursive call in there somewhere. Instead we use this function, and pass it a version of our function that uses the supplied function rather than making a recursive call:

In other words, instead of calling the function itself, you call `recursive-call`

above. Along with the argument, you supply a proof that it’s smaller than the outer argument (`y < x`

; assume for now that the definition of `<`

is just some relation like `_<_`

in Data.Nat).

But wait! You don’t have to use it! Instead of all that, you can just pass the `Acc _<_ x`

type as a parameter to your function. In other words, if you have a dangerous function:

Instead write:

Once you pattern match on the accessibility relation, the termination checker is satisfied. This is much easier to understand (for me anyway), and made it *much* easier to write proofs about it.

Thanks to Oleg Grenrus (phadej) on irc for helping me out with this! Funnily enough, he actually recommended the `Acc`

approach, and I instead originally went with the recursion combinator. Would have saved a couple hours if I’d just listened! Also worth mentioning is the approach recommended by Guillaume Allais (gallais), detailed here. Haven’t had time to figure it out, so this article may be updated to recommend it instead in the future.

This one is really important. If I hadn’t read the exact explanation here I think I may have given up with Agda (or at the very least the project I’m working on) out of frustration.

Basically the problem arises like this. Say you’re writing a function to split a vector in two. You can specify the type pretty precisely:

Try to pattern-match on `xs`

, though, and you’ll get the following error:

```
I'm not sure if there should be a case for the constructor [],
because I get stuck when trying to solve the following unification
problems (inferred index ≟ expected index):
zero ≟ n + m
when checking that the expression ? has type Vec .A .n × Vec .A .m
```

What?! That’s weird. Anyway, you fiddle around with the function, end up pattern matching on the `n`

instead, and continue on with your life.

What about this, though: you want to write a type for proofs that one number is less than or equal to another. You go with something like this:

And you want to use it in a proof. Here’s the example we’ll be using: if two numbers are less than some limit `u`

, then their maximum is also less than that limit:

```
max : ℕ → ℕ → ℕ
max zero m = m
max (suc n) zero = suc n
max (suc n) (suc m) = suc (max n m)
max-≤ : ∀ n m {u} → n ≤ u → m ≤ u → max n m ≤ u
max-≤ n m (proof k) m≤u = {!!}
```

It won’t let you match on `m≤u`

! Here’s the error:

```
I'm not sure if there should be a case for the constructor proof,
because I get stuck when trying to solve the following unification
problems (inferred index ≟ expected index):
m₁ + k₂ ≟ n₁ + k₁
when checking that the expression ? has type max n m ≤ n + k
```

What do you *mean* you’re not sure if there’s a case for the constructor `proof`

: it’s the *only* case!

The problem is that Agda is trying to *unify* two types who both have calls to user-defined functions in them, which is a hard problem. As phrased by Conor McBride:

When combining prescriptive and descriptive indices, ensure both are in constructor form. Exclude defined functions which yield difficult unification problems.

So if you ever get the “I’m not sure if…” error, try either to:

- Redefine the indices so they use constructors, not functions.
- Remove the index, instead having a proof inside the type of equality. What does that mean? Basically, transform the definition of
`≤`

above into the one in Data.Nat.

The use-case I had for this is a little long, I’m afraid (too long to include here), but it *did* come in handy. Basically, if you’re trying to prove something about a function, you may well want to *run* that function and pattern match on the result.

This is a little different from the normal way of doing things, where you’d pattern match on the argument. It is a pattern you’ll sometimes need to write, though. And here’s the issue: that `y`

has nothing to do with `f x`

, as far as Agda is concerned. All you’ve done is introduced a new variable, and that’s that.

This is exactly the problem `inspect`

solves: it runs your function, giving you a result, but *also* giving you a proof that the result is equal to running the function. You use it like this:

```
f-is-the-same-as-g : ∀ x → f x ≡ g x
f-is-the-same-as-g x with f x | inspect f x
f-is-the-same-as-g x | y | [ fx≡y ] = {!!}
```

Because the Agda standard library is a big fan of type synonyms (`Op₂ A`

instead of `A → A → A`

for example), it’s handy to know that pressing `SPC-G-G`

(in spacemacs) over any identifier will bring you to the definition. Also, you can normalize a type with `SPC-m-n`

.

This one is a little confusing, because Agda’s notion of “irrelevance” is different from Idris’, or Haskell’s. In all three languages, irrelevance is used for performance: it means that a value doesn’t need to be around at runtime, so the compiler can elide it.

That’s where the similarities stop though. In Haskell, *all* types are irrelevant: they’re figments of the typechecker’s imagination. You can’t get a type at runtime full stop.

In dependently typed languages, this isn’t a distinction we can rely on. The line between runtime entities and compile-time entities is drawn elsewhere, so quite often types *need* to exist at runtime. As you might guess, though, they don’t always need to. The length of a length-indexed vector, for instance, is completely determined by the structure of the vector: why would you bother storing all of that information at runtime? This is what Idris recognizes, and what it tries to remedy: it analyses code for these kinds of opportunities for elision, and does so when it can. Kind of like Haskell’s fusion, though, it’s an invisible optimization, and there’s no way to make Idris throw a type error when it can’t elide something you want it to elide.

Agda is totally different. Something is irrelevant in Agda if it’s *unique*. Or, rather, it’s irrelevant if all you rely on is its existence. It’s used for proofs that you carry around with you: in a rational number type, you might use it to say that the numerator and denominator have no common factors. The only information you want from this proof is whether it holds or not, so it’s the perfect candidate for irrelevance.

Weirdly, this means it’s useless for the length-indexed vector kind of stuff mentioned above. In fact, it doe exactly the opposite of what you might expect: if the length parameter is marked as irrelevant, the the types `Vec A n`

and `Vec A (suc n)`

are the same!

The way you *can* use it is to pattern-match if it’s impossible. Again, it’s designed for eliding proofs that you may carry with you otherwise.

Once I’m finished the project, I’ll try write up a guide on how to do literate Agda files. There were a couple of weird nuances that I had to pick up on the way, mainly to do with getting unicode to work.

]]>I’ve been writing a lot of Agda recently, and had the occasion to write a Fenwick tree that did some rebalancing. I went with AVL-style rebalancing (rather than red-black or trees of bounded balance). I’d written pretty full implementations of the other two before, and the Agda standard library (Danielsson 2018) has an implementation already that I was able to use as a starting point. Also, apparently, AVL trees seem to perform better than red-black trees in practice (Pfaff 2004).

This post will be similar in style to Stephanie Weirich’s talk (2014), which compares an Agda implementation of verified red-black trees to a Haskell one. When there’s two columns of code side-by-side, the left-hand side is Haskell, the right Agda.

The method of constructing the ordering proof is taken from “How to Keep Your Neighbours in Order” (2014) by Conor McBride; the structural proofs are somewhat inspired by the implementation in the Agda standard library, but are mainly my own.

AVL trees are more strictly balanced than red-black trees: the height of neighboring subtrees can differ by at most one. To store the height, we will start as every dependently-typed program does: with Peano numbers.

Haskell

The trees will be balanced one of three possible ways: left-heavy, right-heavy, or even. We can represent these three cases in a GADT in the case of Haskell, or an indexed datatype in the case of Agda:

Those unfamiliar with Agda might be a little intimidated by the mixfix operator in the balance definition: we’re using it here because the type can be seen of a proof that:

$max(x,y) = z$

Or, using the $\sqcup$ operator:

$(x \sqcup y) = z$

We’ll use this proof in the tree itself, as we’ll need to know the maximum of the height of a node’s two subtrees to find the height of the node. Before we do that, we’ll need a couple helper functions for manipulating the balance:

Along with the verification of the structure of the tree, we will also want to verify that its contents are ordered correctly. Unfortunately, this property is a little out of reach for Haskell, but it’s 100% doable in Agda. First, we’ll need a way to describe orders on a data type. In Haskell, we might write:

That `Bool`

throws away any information gained in the comparison, though: we want to supply a proof with the result of the comparison. First, equality:

This is one of the many ways to describe equality in Agda. It’s a type with only one constructor, and it can only be constructed when its two arguments are the same. When we pattern match on the constructor, then, we’re given a proof that whatever things those arguments refer to must be the same.

Next, we need to describe an order. For this, we’ll need two types: the empty type, and the unit type.

These are kind of like type-level Bools, with one extra, powerful addition: they keep their proof after construction. Because `⊥`

has no constructors, if someone tells you they’re going to give you one, you can be pretty sure they’re lying. How do we use this? Well, first, on the numbers:

Therefore, if we ask for something of type `x ℕ< y`

(for some `x`

and `y`

), we know that it only exists when `x`

really is less than `y`

(according to the definition above).

For our actual code, we’ll parameterize the whole thing over some abstract key type. We’ll do this using a module (a feature recently added to Haskell, as it happens). That might look something like this:

(the `k`

and `r`

here, as well as the `Lift`

ing noise below, are to do with Agda’s universe system, which I’ll try explain in a bit)

Now, the trick for the ordering is to keep a proof that two neighboring values are ordered correctly in the tree at each leaf (as there’s a leaf between every pair of nodes, this is exactly the place you *should* store such a proof). A problem arises with the extremal leaves in the tree (leftmost and rightmost): each leaf is missing one neighboring value, so how can it store a proof of order? The solution is to affix two elements to our key type which we define as the greatest and least elements of the set.

After all that, we can get bring back Haskell into the story, and define or tree types:

The two definitions are similar, but have a few obvious differences. The Agda version stores the ordering proof at the leaves, as well as the bounds as indices. Its *universe* is also different: briefly, universes are one of the ways to avoid Russell’s paradox when you’re dealing with dependent types.

In normal, standard Haskell, we think of types as things that describe values (how quaint!). When you’ve got a list, everything in the list has the same type, and that is good and right.

These days, though, we’re not so constrained:

This can quite happily store elements of different types:

And look at that bizarre-looking list on the wrong side of “`::`

”! Types aren’t just describing values, they’re acting like values themselves. What type does `[Bool, String, Integer]`

even have, anyway? Why, `[Type]`

of course!

So we see that types can be put in lists, and types have types: the natural question then is:

And this is where Haskell and Agda diverge: in Haskell, we say `Type :: Type`

(as the old extension `TypeInType`

implied), and that’s that. From a certain point of view, we’ve opened the door to Russell’s paradox (we’ve allowed a set to be a member of itself). This isn’t an issue in Haskell, though, as the type-level language was already inconsistent.

Agda goes another way, saying that `Set`

(Agda’s equivalent for `Type`

) has the type `Set₁`

, and `Set₁`

has the type `Set₂`

, and so on^{1}. These different sets are called “universes” and their numbers “levels”. When we write `k ⊔ v ⊔ r`

, we’re saying we want to take the greatest universe level from those three possible levels: the level of the key, the value, and the relation, respectively.

AVL trees maintain their invariants through relatively simple rotations. We’ll start with the right rotation, which fixes an imbalance of two on the left. Because the size of the tree returned might change, we’ll need to wrap it in a datatype:

We could actually have the Agda definition be the same as Haskell’s, it doesn’t make much difference. I’m mainly using it here to demonstrate dependent pairs in Agda. The first member of the pair is just a boolean (increased in height/not increased in height). The second member is a tree whose height *depends* on the actual value of the boolean. The `∃`

business is just a fancy syntax; it also waggles its eyebrows at the way a (dependent) pair of type `(x , y)`

means “There exists an x such that y”.

Using this, we can write the type for right-rotation:

There are two possible cases, single rotation:

And double:

I won’t bore you with left-rotation: suffice to say, it’s the opposite of right-rotation.

Finally, the main event: insertion. Once the above functions have all been defined, it’s not very difficult, as it happens: by and large, the types guide you to the right answer. Of course, this is only after we decided to use the pivotal pragmatism and balance approach.

```
insertWith
:: Ord k
=> (v -> v -> v)
-> k
-> v
-> Tree h k v
-> Tree k v ++? h
insertWith _ v vc Leaf =
Incr (Node v vc O Leaf Leaf)
insertWith f v vc (Node k kc bl tl tr) =
case compare v k of
LT ->
case insertWith f v vc tl of
Stay tl' ->
Stay (Node k kc bl tl' tr)
Incr tl' -> case bl of
L -> rotr k kc tl' tr
O -> Incr (Node k kc L tl' tr)
R -> Stay (Node k kc O tl' tr)
EQ ->
Stay (Node v (f vc kc) bl tl tr)
GT ->
case insertWith f v vc tr of
Stay tr' ->
Stay (Node k kc bl tl tr')
Incr tr' -> case bl of
L -> Stay (Node k kc O tl tr')
O -> Incr (Node k kc R tl tr')
R -> rotl k kc tl tr'
```

```
insert : ∀ {l u h v}
{V : Key → Set v}
(k : Key)
→ V k
→ (V k → V k → V k)
→ Tree V l u h
→ l < k < u
→ Tree V l u 1?+⟨ h ⟩
insert v vc f (leaf l<u) (l , u) =
1+ (node v vc ▽ (leaf l) (leaf u))
insert v vc f (node k kc bl tl tr) prf
with compare v k
insert v vc f (node k kc bl tl tr) (l , _)
| tri< a _ _ with insert v vc f tl (l , a)
... | 0+ tl′ = 0+ (node k kc bl tl′ tr)
... | 1+ tl′ with bl
... | ◿ = rotʳ k kc tl′ tr
... | ▽ = 1+ (node k kc ◿ tl′ tr)
... | ◺ = 0+ (node k kc ▽ tl′ tr)
insert v vc f (node k kc bl tl tr) _
| tri≈ _ refl _ =
0+ (node k (f vc kc) bl tl tr)
insert v vc f (node k kc bl tl tr) (_ , u)
| tri> _ _ c with insert v vc f tr (c , u)
... | 0+ tr′ = 0+ (node k kc bl tl tr′)
... | 1+ tr′ with bl
... | ◿ = 0+ (node k kc ▽ tl tr′)
... | ▽ = 1+ (node k kc ◺ tl tr′)
... | ◺ = rotˡ k kc tl tr′
```

Overall, I’ve been enjoying programming in Agda. The things I liked and didn’t like surprised me:

- Editor Support
Is excellent. I use spacemacs, and the whole thing worked pretty seamlessly. Proof search and auto was maybe not as powerful as Idris’, although that might be down to lack of experience (note—as I write this, I see you can enable case-splitting in proof search, so it looks like I was right about my lack of experience). In many ways, it was much better than Haskell’s editor support: personally, I have never managed to get case-splitting to work in my Haskell setup, never mind some of the fancier features that you get in Agda.

It’s worth noting that my experience with Idris is similar: maybe it’s something about dependent types?

Of course, I missed lots of extra tools, like linters, code formatters, etc., but the tight integration with the compiler was so useful it more than made up for it.

Also, I’d implore anyone who’s had trouble with emacs before to give spacemacs a go. It works well out-of-the-box, and has a system for keybinding discovery that

*actually works*.- Documentation
Pretty good, considering. There are some missing parts (rewriting and telescopes are both stubs on the documentation site), but there seemed to be more fully worked-out examples available online for different concepts when I needed to figure them out.

Now, the thing about a lot of these complaints/commendations (*especially* with regards to tooling and personal setups) is that people tend to be pretty bad about evaluating how difficult finicky tasks like editor setups are. Once you’ve gotten the hang of some of this stuff, you forget that you ever didn’t. Agda is the second dependently-typed language I’ve really gone for a deepish dive on, and I’ve been using spacemacs for a while, so YMMV.

One area of the language itself that I would have liked to see more on was irrelevance. Looking back at the definition of the tree type, in the Haskell version there’s no singleton storing the height (the balance type stores all the information we need), which means that it definitely doesn’t exist at runtime. As I understand it, that implies that the type should be irrelevant in the equivalent Agda. However, when I actually mark it as irrelevant, everything works fine, except that missing cases warnings start showing up. I couldn’t figure out why: Haskell was able to infer full case coverage without the index, after all. Equality proof erasure, also: is it safe? Consistent?

All in all, I’d encourage more Haskellers to give Agda a try. It’s fun, interesting, and $\mathcal{Unicode}$!

No “deletion is left as an exercise to the reader” here, no sir! Fuller implementations of both the Haskell and Agda versions of the code here are available: first, a pdf of the Agda code with lovely colours is here. The accompanying repository is here, and the equivalent for the Haskell code is here. Of course, if you would rather read something by someone who knows what they’re talking about, please see the

Danielsson, Nils Anders. 2018. “The Agda standard library.”

McBride, Conor Thomas. 2014. “How to Keep Your Neighbours in Order.” In *Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming*, 297–309. ICFP ’14. New York, NY, USA: ACM. doi:10.1145/2628136.2628163.

Pfaff, Ben. 2004. “Performance Analysis of BSTs in System Software.” In *Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems*, 410–411. SIGMETRICS ’04/Performance ’04. New York, NY, USA: ACM. doi:10.1145/1005686.1005742.

Weirich, Stephanie. 2014. “Depending on Types.” In *Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming*, 241–241. ICFP ’14. New York, NY, USA: ACM. doi:10.1145/2628136.2631168.

My phrasing is maybe a little confusing here. When

`Set`

“has the type”`Set₁`

it means that`Set`

is*in*`Set₁`

, not the other way around.↩

Tags: Haskell, Probability

Here are the slides for a short talk I gave to a reading group I’m in at Harvard today. The speaker notes are included in the pdf, code and the tex is available in the repository.

]]>
Tags: Probability, Haskell

Ever since the famous pearl by Erwig and Kollmansberger (2006), probabilistic programming with monads has been an interesting and diverse area in functional programming, with many different approaches.

I’m going to present five here, some of which I have not seen before.

As presented in the paper, a simple and elegant formulation of probability distributions looks like this:

It’s a list of possible events, each tagged with their probability of happening. Here’s the probability distribution representing a die roll, for instance:

The semantics can afford to be a little fuzzy: it doesn’t hugely matter if the probabilities don’t add up to 1 (you can still extract meaningful answers when they don’t). However, I can’t see a way in which either negative probabilities or an empty list would make sense. It would be nice if those states were unrepresentable.

Its monadic structure multiplies conditional events:

```
instance Functor Prob where
fmap f xs = Prob [ (f x, p) | (x,p) <- runProb xs ]
instance Applicative Prob where
pure x = Prob [(x,1)]
fs <*> xs
= Prob
[ (f x,fp*xp)
| (f,fp) <- runProb fs
, (x,xp) <- runProb xs ]
instance Monad Prob where
xs >>= f
= Prob
[ (y,xp*yp)
| (x,xp) <- runProb xs
, (y,yp) <- runProb (f x) ]
```

In most of the examples, we’ll need a few extra functions in order for the types to be useful. First is support:

And second is expectation:

```
expect :: (a -> Rational) -> Prob a -> Rational
expect p xs = sum [ p x * xp | (x,xp) <- runProb xs ]
probOf :: (a -> Bool) -> Prob a -> Rational
probOf p = expect (bool 0 1 . p)
```

It’s useful to be able to construct uniform distributions:

```
uniform xs = Prob [ (x,n) | x <- xs ]
where
n = 1 % toEnum (length xs)
die = uniform [1..6]
>>> probOf (7==) $ do
x <- die
y <- die
pure (x+y)
1 % 6
```

As elegant as the above approach is, it leaves something to be desired when it comes to efficiency. In particular, you’ll see a combinatorial explosion at every step. To demonstrate, let’s take the example above, using three-sided dice instead so it doesn’t take up too much space.

The probability table looks like this:

```
2 1/9
3 2/9
4 1/3
5 2/9
6 1/9
```

But the internal representation looks like this:

```
2 1/9
3 1/9
4 1/9
3 1/9
4 1/9
5 1/9
4 1/9
5 1/9
6 1/9
```

States are duplicated, because the implementation has no way of knowing that two outcomes are the same. We could collapse equivalent outcomes if we used a `Map`

, but then we can’t implement `Functor`

, `Applicative`

, or `Monad`

. The types:

```
class Functor f where
fmap :: (a -> b) -> f a -> f b
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
class Applicative f => Monad f where
(>>=) :: f a -> (a -> f b) -> f b
```

Don’t allow an `Ord`

constraint, which is what we’d need to remove duplicates. We can instead make our own classes which *do* allow constraints:

```
{-# LANGUAGE RebindableSyntax #-}
{-# LANGUAGE TypeFamilies #-}
import Prelude hiding (Functor(..),Applicative(..),Monad(..))
import Data.Kind
class Functor f where
type Domain f a :: Constraint
type Domain f a = ()
fmap :: Domain f b => (a -> b) -> f a -> f b
class Functor f => Applicative f where
{-# MINIMAL pure, liftA2 #-}
pure :: Domain f a => a -> f a
liftA2 :: Domain f c => (a -> b -> c) -> f a -> f b -> f c
(<*>) :: Domain f b => f (a -> b) -> f a -> f b
(<*>) = liftA2 ($)
class Applicative f => Monad f where
(>>=) :: Domain f b => f a -> (a -> f b) -> f b
fail :: String -> a
fail = error
return :: (Applicative f, Domain f a) => a -> f a
return = pure
```

This setup gets over a couple common annoyances in Haskell, like making `Data.Set`

a Monad:

```
instance Functor Set where
type Domain Set a = Ord a
fmap = Set.map
instance Applicative Set where
pure = Set.singleton
liftA2 f xs ys = do
x <- xs
y <- ys
pure (f x y)
instance Monad Set where
(>>=) = flip foldMap
```

And, of course, the probability monad:

```
newtype Prob a = Prob
{ runProb :: Map a Rational
}
instance Functor Prob where
type Domain Prob a = Ord a
fmap f = Prob . Map.mapKeysWith (+) f . runProb
instance Applicative Prob where
pure x = Prob (Map.singleton x 1)
liftA2 f xs ys = do
x <- xs
y <- ys
pure (f x y)
instance Ord a => Monoid (Prob a) where
mempty = Prob Map.empty
mappend (Prob xs) (Prob ys) = Prob (Map.unionWith (+) xs ys)
instance Monad Prob where
Prob xs >>= f
= Map.foldMapWithKey ((Prob .) . flip (Map.map . (*)) . runProb . f) xs
support = Map.keys . runProb
expect p = getSum . Map.foldMapWithKey (\k v -> Sum (p k * v)) . runProb
probOf p = expect (bool 0 1 . p)
uniform xs = Prob (Map.fromList [ (x,n) | x <- xs ])
where
n = 1 % toEnum (length xs)
ifThenElse True t _ = t
ifThenElse False _ f = f
die = uniform [1..6]
>>> probOf (7==) $ do
x <- die
y <- die
pure (x + y)
1 % 6
```

Coming up with the right implementation all at once is quite difficult: luckily, there are more general techniques for designing DSLs that break the problem into smaller parts, which also give us some insight into the underlying composition of the probability monad.

The technique relies on an algebraic concept called “free objects”. A free object for some class is a minimal implementation of that class. The classic example is lists: they’re the free monoid. Monoid requires that you have an additive operation, an empty element, and that the additive operation be associative. Lists have all of these things: what makes them *free*, though, is that they have nothing else. For instance, the additive operation on lists (concatenation) isn’t commutative: if it was, they wouldn’t be the free monoid any more, because they satisfy an extra law that’s not in monoid.

For our case, we can use the free monad: this takes a functor and gives it a monad instance, in a way we know will satisfy all the laws. This encoding is used in several papers (Ścibior, Ghahramani, and Gordon 2015; Larsen 2011).

The idea is to first figure out what primitive operation you need. We’ll use weighted choice:

Then you encode it as a functor:

We’ll say the left-hand-choice has chance $p$, and the right-hand $1-p$. Then, you just wrap it in the free monad:

And you already have a monad instance. Support comes from the `Foldable`

instance:

Expectation is an “interpreter” for the DSL:

```
expect :: (a -> Rational) -> Prob a -> Rational
expect p = iter f . fmap p
where
f (Choose c l r) = l * c + r * (1-c)
```

For building up the tree, we can use Huffman’s algorithm:

```
fromList :: (a -> Rational) -> [a] -> Prob a
fromList p = go . foldMap (\x -> singleton (p x) (Pure x))
where
go xs = case minView xs of
Nothing -> error "empty list"
Just ((xp,x),ys) -> case minView ys of
Nothing -> x
Just ((yp,y),zs) ->
go (insertHeap (xp+yp) (Free (Choose (xp/(xp+yp)) x y)) zs)
```

And finally, it gets the same notation as before:

```
uniform = fromList (const 1)
die = uniform [1..6]
probOf p = expect (bool 0 1 . p)
>>> probOf (7==) $ do
x <- die
y <- die
pure (x + y)
1 % 6
```

One of the advantages of the free approach is that it’s easy to define multiple interpreters. We could, for instance, write an interpreter that constructs a diagram:

```
>>> drawTree ((,) <$> uniform "abc" <*> uniform "de")
┌('c','d')
┌1 % 2┤
│ └('c','e')
1 % 3┤
│ ┌('a','d')
│ ┌1 % 2┤
│ │ └('a','e')
└1 % 2┤
│ ┌('b','d')
└1 % 2┤
└('b','e')
```

There’s a lot to be said about free objects in category theory, also. Specifically, they’re related to initial and terminal (also called final) objects. The encoding above is initial, the final encoding is simply `Cont`

:

Here, also, we get the monad instance for free. In contrast to previously, expect is free:

Support, though, isn’t possible.

This version is also called the Giry monad: there’s a deep and fascinating theory behind it, which I probably won’t be able to do justice to here. Check out Jared Tobin’s post (2017) for a good deep dive on it.

The branching structure of the tree captures the semantics of the probability monad well, but it doesn’t give us much insight into the original implementation. The question is, how can we deconstruct this:

Eric Kidd (2007) pointed out that the monad is the composition of the writer and list monads:

but that seems unsatisfying: in contrast to the tree-based version, we don’t encode any branching structure, we’re able to have empty distributions, and it has the combinatorial explosion problem.

Adding a weighting to nondeterminism is encapsulated more concretely by the `ListT`

transformer. It looks like this:

It’s a cons-list, with an effect before every layer^{1}.

While this can be used to give us the monad we need, I’ve found that something more like this fits the abstraction better:

It’s a nonempty list, with the first element exposed. Turns out this is very similar to the cofree comonad:

Just like the initial free encoding, we can start with a primitive operation:

And we get all of our instances as well:

```
newtype Prob a
= Prob
{ runProb :: Cofree Perhaps a
} deriving (Functor,Foldable)
instance Comonad Prob where
extract (Prob xs) = extract xs
duplicate (Prob xs) = Prob (fmap Prob (duplicate xs))
foldProb :: (a -> Rational -> b -> b) -> (a -> b) -> Prob a -> b
foldProb f b = r . runProb
where
r (x :< Impossible) = b x
r (x :< WithChance p xs) = f x p (r xs)
uniform :: [a] -> Prob a
uniform (x:xs) = Prob (coiterW f (EnvT (length xs) (x :| xs)))
where
f (EnvT 0 (_ :| [])) = Impossible
f (EnvT n (_ :| (y:ys)))
= WithChance (1 % fromIntegral n) (EnvT (n - 1) (y:|ys))
expect :: (a -> Rational) -> Prob a -> Rational
expect p = foldProb f p
where
f x n xs = (p x * n + xs) / (n + 1)
probOf :: (a -> Bool) -> Prob a -> Rational
probOf p = expect (\x -> if p x then 1 else 0)
instance Applicative Prob where
pure x = Prob (x :< Impossible)
(<*>) = ap
append :: Prob a -> Rational -> Prob a -> Prob a
append = foldProb f (\x y -> Prob . (x :<) . WithChance y . runProb)
where
f e r a p = Prob . (e :<) . WithChance ip . runProb . a op
where
ip = p * r / (p + r + 1)
op = p / (r + 1)
instance Monad Prob where
xs >>= f = foldProb (append . f) f xs
```

We see here that we’re talking about gambling-style odds, rather than probability. I wonder if the two representations are dual somehow?

The application of comonads to streams (`ListT`

) has been explored before (Uustalu and Vene 2005); I wonder if there are any insights to be gleaned from this particular probability comonad.

Erwig, Martin, and Steve Kollmansberger. 2006. “Functional pearls: Probabilistic functional programming in Haskell.” *Journal of Functional Programming* 16 (1): 21–34. doi:10.1017/S0956796805005721.

Kidd, Eric. 2007. “Build your own probability monads.”

Larsen, Ken Friis. 2011. “Memory Efficient Implementation of Probability Monads.”

Ścibior, Adam, Zoubin Ghahramani, and Andrew D. Gordon. 2015. “Practical Probabilistic Programming with Monads.” In *Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell*, 50:165–176. Haskell ’15. New York, NY, USA: ACM. doi:10.1145/2804302.2804317.

Tobin, Jared. 2017. “Implementing the Giry Monad.” *jtobin.io*.

Uustalu, Tarmo, and Varmo Vene. 2005. “The Essence of Dataflow Programming.” In *Proceedings of the Third Asian Conference on Programming Languages and Systems*, 2–18. APLAS’05. Berlin, Heidelberg: Springer-Verlag. doi:10.1007/11575467_2.

Note this is

*not*the same as the`ListT`

in transformers; instead it’s a “ListT done right”.↩

Part 4 of a 5-part series on Breadth-First Traversals

Tags: Haskell

After the last post, Noah Easterly pointed me to their tree-traversals library, and in particular the `Phases`

applicative transformer. It allows you to batch applicative effects to be run together: for the breadth-first traversal, we can batch the effects from each level together, giving us a lovely short solution to the problem.

```
breadthFirst c = runPhasesForwards . go
where
go (x:<xs) = liftA2 (:<) (now (c x)) (delay (traverse go xs))
```

In my efforts to speed this implementation up, I came across a wide and interesting literature on scheduling effects, which I’ll go through a little here.

The first thing that jumps to mind, for me, when I think of “scheduling” is coroutines. These are constructs that let you finely control the order of execution of effects. They’re well explored in Haskell by now, and most libraries will let you do something like the following:

We first print `1`

, then, after a delay, we print `2`

. The `delay`

doesn’t make a difference if we just run the whole thing:

But you can see its effect when we use the `interleave`

combinator:

Hopefully you can see how useful this might be, and the similarity to the `Phases`

construction.

The genealogy of most coroutine libraries in Haskell seems to trace back to Blažević (2011) or Kiselyov (2012): the implementation I have been using in these past few examples (`IterT`

) comes from a slightly different place. Let’s take a quick detour to explore it a little.

In functional programming, there are several constructions for modeling error-like states: `Maybe`

for your nulls, `Either`

for your exceptions. What separates these approaches from the “unsafe” variants (null pointers, unchecked exceptions) is that we can *prove*, in the type system, that the error case is handled correctly.

Conspicuously absent from the usual toolbox for modeling partiality is a way to model *nontermination*. At first glance, it may seem strange to attempt to do so in Haskell. After all, if I have a function of type:

I can prove that I won’t throw any errors (with `Either`

, that is), because the type `Int`

doesn’t contain `Left _`

. I’ve also proved, miraculously, that I won’t make any null dereferences, because `Int`

also doesn’t contain `Nothing`

. I *haven’t* proved, however, that I won’t loop infinitely, because (in Haskell), `Int`

absolutely *does* contain $\bot$.

So we’re somewhat scuppered. On the other hand, While we can’t *prove* termination in Haskell, we can:

- Model it.
- Prove it in something else.

Which is exactly what Venanzio Capretta did in the fascinating (and quite accessible) talk “Partiality is an effect” (Capretta, Altenkirch, and Uustalu 2004)^{1}.

The monad in question looks like this:

We’re writing in Idris for the time being, so that we can prove termination and so on. The “recursive call” to `Iter`

is guarded by the `Inf`

type: this turns on a different kind of totality checking in the compiler. Usually, Idris will prevent you from constructing infinite values. But that’s exactly what we want to do here. Take the little-known function `until`

:

It’s clearly not necessarily total, and the totality checker will complain as such when we try and implement it directly:

But we can use `Iter`

to model that possible totality:

```
until : (a -> Bool) -> (a -> a) -> a -> Iter a
until p f x = if p x then Now x else Later (until p f (f x))
```

Of course, nothing’s for free: when we get the ability to construct infinite values, we lose the ability to consume them.

We get an error on the `run`

function. However, as you would expect, we can run *guarded* iteration: iteration up until some finite point.

```
runUntil : Nat -> Iter a -> Maybe a
runUntil Z _ = Nothing
runUntil (S n) (Now x) = Just x
runUntil (S n) (Later x) = runUntil n x
```

Making our way back to Haskell, we must first—as is the law—add a type parameter, and upgrade our humble monad to a monad transformer:

The semantic meaning of the extra `m`

here is interesting: each layer adds not just a recursive step, or a single iteration, but a single effect. Interpreting things in this way gets us back to the original goal:

The `Later`

constructor above can be translated to a `delay`

function on the transformer:

And using this again, we can write the following incredibly short definition for `unfoldTreeM_BF`

:

```
unfoldTreeM_BF :: Monad m => (b -> m (a, [b])) -> b -> m (Tree a)
unfoldTreeM_BF f = retract . go
where
go b = do
(x,xs) <- lift (f b)
fmap (Node x) (interleave (map (delay . go) xs))
```

It would be nice to bring this back to traversals, but alas, `IterT`

is pretty monad-centric. What’s more, if it’s analogous to `Phases`

it certainly doesn’t look like it:

However, in the documentation for `IterT`

, there’s the following little note:

Where `FreeT`

is the free monad transformer. This seems to strongly hint that we could get the same thing for applicatives with `ApT`

. Let’s try it:

The `Applicative`

instance is a little hairy, but it *seems* correct:

```
instance Applicative f =>
Applicative (Phases f) where
pure = Phases . pure
liftA2 f' (Phases (ApT xs')) (Phases (ApT ys')) =
Phases (ApT (liftA2 (go f') xs' ys'))
where
go
:: ∀ a b c.
(a -> b -> c)
-> ApF Identity f a
-> ApF Identity f b
-> ApF Identity f c
go f (Pure x) ys = fmap (f x) ys
go f xs (Pure y) = fmap (`f` y) xs
go f (Ap x (ApT xs)) (Ap y (ApT ys)) =
Ap
(liftA2 (,) x y)
(ApT (liftA2 (go (\xx yy -> uncurry f . (xx *** yy))) xs ys))
```

(on a side note: thank *goodness* for `liftA2`

finally getting into `Applicative`

)

And we get all the normal combinators:

```
delay :: Applicative f => Phases f a -> Phases f a
delay = Phases . ApT . pure . Ap (pure ()) . fmap const . runPhases
lift :: Functor f => f a -> Phases f a
lift = Phases . liftApO
```

The issue comes with running the thing at the end: `Monad`

creeps back in.

```
retract :: Monad f => Phases f a -> f a
retract = fmap (runIdentity . retractAp) . joinApT . runPhases
```

Because the effects are all layered on top of each other, you need to flatten them out at the end, which requires `join`

. Mind you, it does work: it’s just not as general as it could be.

All’s not lost, though. Turns out, we never needed the transformer in the first place: we could just define the different applicative instance straight off.

```
newtype Phases f a = Phases
{ runPhases :: Ap f a
} deriving Functor
instance Applicative f =>
Applicative (Phases f) where
pure = Phases . Pure
liftA2 f' (Phases xs') (Phases ys') = Phases (go f' xs' ys')
where
go :: ∀ a b c.
(a -> b -> c)
-> Ap f a
-> Ap f b
-> Ap f c
go f (Pure x) ys = fmap (f x) ys
go f xs (Pure y) = fmap (`f` y) xs
go f (Ap x xs) (Ap y ys) =
Ap
(liftA2 (,) x y)
(go (\xx yy -> uncurry f . (xx *** yy)) xs ys)
delay :: Applicative f => Phases f a -> Phases f a
delay = Phases . Ap (pure ()) . fmap const . runPhases
retract :: Applicative f => Phases f a -> f a
retract = retractAp . runPhases
lift :: f a -> Phases f a
lift = Phases . liftAp
```

In the wonderful article Coroutine Pipelines (Blažević 2011), several different threads on coroutine-like constructions are unified. What I’ve demonstrated above isn’t yet as powerful as what you might see in a full coroutine library: ideally, you’d want generators and sinks. As it turns out, when we look back at the note from `IterT`

:

We can get both of those other constructs by swapping out `Identity`

^{2}:

(`Sink`

is usually called an `Iteratee`

)

This is the fundamental abstraction that underlies things like the pipes library (Gonzalez 2018).

The only missing part from the first coroutine example by now is `interleave`

. In the free library, it has the following signature:

But we should be able to spot that, really, it’s a traversal. And, as a traversal, it should rely on some underlying `Applicative`

instance. Let’s try and come up with one:

```
newtype Parallel m f a = Parallel
{ runParallel :: FreeT m f a
}
instance (Functor f, Functor m) =>
Functor (Parallel m f) where
fmap f = Parallel . FreeT . fmap go . runFreeT . runParallel
where
go = bimap f (FreeT . fmap go . runFreeT)
instance (Applicative f, Applicative m) =>
Applicative (Parallel m f) where
pure = Parallel . FreeT . pure . Pure
Parallel fs' <*> Parallel xs' = Parallel (unw fs' xs')
where
unw (FreeT fs) (FreeT xs) = FreeT (liftA2 go fs xs)
go (Pure f) = bimap f (runParallel . fmap f . Parallel)
go (Free fs) = Free . \case
Pure x -> fmap (runParallel . fmap ($x) . Parallel) fs
Free xs -> liftA2 unw fs xs
```

Now, interleave is just `sequenceA`

!

So we can see that there’s a “parallel” applicative for both the free monad and the free applicative. To try and understand this type a little better, we can leverage our intuition about a much simpler, more familiar setting: lists. There’s an interesting similarity between lists and the free monad: `FreeT ((,) a)`

) looks a lot like “`ListT`

done right” (so much so, in fact, that most coroutine libraries provide their own version of it). More concretely, list also has a famous “parallel” applicative: `ZipList`

!

```
newtype ZipList a
= ZipList
{ getZipList :: [a]
} deriving Functor
instance Applicative ZipList where
pure = ZipList . repeat
liftA2 f (ZipList xs) (ZipList ys) = ZipList (zipWith f xs ys)
```

We’ll use some of our knowledge about `ZipList`

to help us in the next section.

We’ve seen that efforts to model both coroutines and partiality end up in the same neighborhood: there’s yet another way to get there, which seems (at first) almost the opposite of the second. It starts with a blog post from Conor McBride (2009) called “Time flies like an applicative functor”. Curiously, here too breadth-first labeling is the focus. Remember first the lovely circular solution from (**???**):

```
data Tree a = Leaf | Node a (Tree a) (Tree a)
relabel :: Tree x -> [[a]] -> (Tree a, [[a]])
relabel Leaf xss = (Leaf,xss)
relabel (Node _ l r) ((x:xs):xss0) =
let (l',xss1) = relabel l xss0
(r',xss2) = relabel r xss1
in (Node x l' r',xs:xss2)
bflabel :: Tree x -> [a] -> Tree a
bflabel tr xs = u
where
(u,xss) = relabel tr (xs:xss)
```

As lovely as it is, spare a thought for the poor totality checker: it’s hard to imagine how it would even *start* to show that something so lazy and circular would terminate. `IterT`

won’t help us here, either: it can help us express programs that *might* diverge, not weird-looking ones that definitely won’t.

The solution presented is a type (`De`

) which has a limited set of combinators: a fixpoint (`fix :: (De x -> x) -> x`

), and an applicative instance. As long as all problematic recursive calls are instead expressed using those combinators, the termination checker should be satisfied.

`De`

can be thought of as a “delay” wrapper. Values of type `De a`

are one step in the future, `De (De a)`

are two, and so on. This idea was later expanded upon in Atkey (2011) and Atkey and McBride (2013) to *clock variables*. Instead of types with a delay, types are tagged with how much more time they have (something like “fuel” in the Idris sense, maybe). So a value of type $a^\mathsf{K}$ is tagged with time $\mathsf{K}$, effectively meaning “I have $\mathsf{K}$ productive steps left before I diverge”. “Productive steps” will mean something different for every data type: for lists, it could mean that it can produce up until the $\mathsf{K}$th cons-cell. In the paper (Atkey and McBride 2013) this is fleshed out a little more, with fixpoint combinators and so on. As a concrete example, take the type of the cons operator on streams:

It increments the clock on the type, saying that it has one more productive step than it did before. This is kind of the opposite of a “delay”: previously, the scheduling types have meant “this is available $\mathsf{K}$ number of steps in the future” rather than “this is available for another $\mathsf{K}$ steps”. We can still describe delays in this system, though, using the $\rhd^\mathsf{K}$ notation:

$\begin{equation} \text{Cons} : \text{a} \rightarrow \rhd^\mathsf{K}\text{Stream a} \rightarrow \text{Stream a} \end{equation}$Let’s first try express some of this in the free monad:

```
data K = Z | S K
data Delay :: K -> (Type -> Type) -> (Type -> Type) -> Type -> Type where
Now :: a -> Delay n f m a
Later :: f (DelayT n f m a) -> Delay (S n) f m a
instance (Functor f, Functor m) => Functor (Delay n f m) where
fmap f (Now x) = Now (f x)
fmap f (Later xs) = Later (fmap (fmap f) xs)
newtype DelayT n f m a = DelayT { runDelayT :: m (Delay n f m a) }
instance (Functor f, Functor m) =>
Functor (DelayT n f m) where
fmap f = DelayT . fmap (fmap f) . runDelayT
```

We can straight away express one of the combinators from the paper, `force`

:

```
force :: Functor m => (∀ k. DelayT k f m a) -> m a
force (DelayT xs) = fmap f xs
where
f :: Delay Z f m a -> a
f (Now x) = x
```

Similar trick to `runST`

here: if the type is delayed however long we want it to be, then it mustn’t really be delayed at all.

Next, remember that we have types for streams (generators) from the `IterT`

monad:

And cons does indeed have the right type:

```
cons :: Applicative m => a -> Stream n a m b -> Stream (S n) a m b
cons x xs = DelayT (pure (Later (x,xs)))
```

We also get an applicative:

```
instance (Applicative f, Applicative m) =>
Applicative (DelayT n f m) where
pure = DelayT . pure . Now
DelayT fs' <*> DelayT xs' = DelayT (liftA2 go fs' xs')
where
go :: ∀ k a b. Delay k f m (a -> b) -> Delay k f m a -> Delay k f m b
go (Now f) = fmap f
go (Later fs) = Later . \case
Now x -> fmap (fmap ($x)) fs
Later xs -> liftA2 (<*>) fs xs
```

Now, I’m not sure how much this stuff actually corresponds to the paper, but what caught my eye is the statement that `De`

is a classic “applicative-not-monad”: just like `ZipList`

. However, under the analogy that the free monad is listy, and the parallel construction is ziplist-y, what we have in the `DelayT`

is the equivalent of a length-indexed list. These have an applicative instance similar to ziplists: but they also have a monad. Can we apply the same trick here?

There’s a lot of fascinating stuff out there—about clock variables, especially—that I hope to get a chance to learn about once I get a chance. What I’m particularly interested to follow up on includes:

- Comonads and their relationship to these constructions. Streams are naturally expressed as comonads, could they be used as a basis on which to build a similar “delay” mechanism?
- I’d love to explore more efficient implementations like the ones in Spivey (2017).
- I’m interested to see the relationship between these types, power series, and algebras for combinatorial search (Spivey 2009).

Atkey, Robert. 2011. “How to be a Productive Programmer - by putting things off until tomorrow.” Heriot-Watt University.

Atkey, Robert, and Conor McBride. 2013. “Productive coprogramming with guarded recursion.” In, 197. ACM Press. doi:10.1145/2500365.2500597.

Blažević, Mario. 2011. “Coroutine Pipelines.” *The Monad.Reader* 19 (19) (August): 29–50.

Capretta, Venanzio, Thorsten Altenkirch, and Tarmo Uustalu. 2004. “Partiality is an effect.” In *Dependently Typed Programming*, 04381:20. Dagstuhl Seminar Proceedings. Dagstuhl, Germany: Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany.

Gonzalez, Gabriel. 2018. “Pipes: Compositional pipelines.”

Kiselyov, Oleg. 2012. “Iteratees.” In *Proceedings of the 11th International Conference on Functional and Logic Programming*, 166–181. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-29822-6_15.

McBride, Conor. 2009. “Time flies like an applicative functor.” *Epilogue for Epigram*.

Spivey, J. Michael. 2009. “Algebras for combinatorial search.” *Journal of Functional Programming* 19 (3-4) (July): 469–487. doi:10.1017/S0956796809007321.

Spivey, Michael. 2017. “Faster coroutine pipelines.” *Proceedings of the ACM on Programming Languages* 1 (ICFP) (August): 1–23. doi:10.1145/3110249.

There is a later, seemingly more formal version of the talk available (

**???**), but the one from 2004 was a little easier for me to understand, and had a lot more Haskell code.↩Small note:

`(,) a`

and`(->) a`

are adjunct. I wonder if there is any implication from this? Certainly, producers and consumers seem adjunct, but there’s no instance I can find for it in adjunctions.↩

Part 3 of a 5-part series on Breadth-First Traversals

Tags: Haskell

After looking at the algorithms I posted last time, I noticed some patterns emerging which I thought deserved a slightly longer post. I’ll go through the problem (Gibbons 2015) in a little more detail, and present some more algorithms to go along with it.

The original question was posed by Etian Chatav:

What is the correct way to write breadth first traversal of a

`[Tree]`

?

The breadth-first traversal here is a traversal in the lensy sense, i.e:

The `Tree`

type we’re referring to here is a rose tree; we can take the one defined in `Data.Tree`

:

Finally, instead of solving the (somewhat intermediate) problem of traversing a forest, we’ll look directly at traversing the tree itself. In other words, our solution should have the type:

As in Gibbons (2015), let’s first look at just converting the tree to a list in breadth-first order. In other words, given the tree:

```
┌3
┌2┤
│ └4
1┤
│ ┌6
└5┤
└7
```

We want the list:

Last time I looked at this problem, the function I arrived at was as follows:

```
breadthFirstEnumerate :: Tree a -> [a]
breadthFirstEnumerate ts = f ts b []
where
f (Node x xs) fw bw = x : fw (xs : bw)
b [] = []
b qs = foldl (foldr f) b qs []
```

It’s admittedly a little difficult to understand, but it’s really not too complex: we’re popping items off the front of a queue, and pushing the subforest onto the end. `fw`

is the recursive call here: that’s where we send the queue with the element pushed on. Even though it may *look* like we’re pushing onto the front (as we’re using a cons), this is really the *end* of the queue, since it’s being consumed in reverse, with `foldl`

.

We can compare it to the technique used in Allison (2006) and Smith (2009), where it’s called *corecursive queues*. Breadth-first enumeration is accomplished as follows in Smith (2009):

```
levelOrder :: Tree a -> [a]
levelOrder tr = map rootLabel qs
where
qs = enqs [tr] 0 qs
enqs [] n xs = deq n xs
enqs (t:ts) n xs = t : enqs ts (n+1) xs
deq 0 _ = []
deq n (x:xs) = enqs (subForest x) (n-1) xs
```

We get to avoid tracking the length of the queue, however.

Before we go the full way to traversal, we can try add a little structure to our breadth-first enumeration, by delimiting between levels in the tree. We want our function to have the following type:

Looking back at our example tree:

```
┌3
┌2┤
│ └4
1┤
│ ┌6
└5┤
└7
```

We now want the list:

This function is strictly more powerful than `breadthFirstEnumerate`

, as we can define one in terms of the other:

It’s also just a generally useful function, so there are several example implementations available online.

The one provided in Data.Tree is as follows:

Pretty nice, but it looks to me like it’s doing a lot of redundant work. We could write it as an unfold:

```
levels t = unfoldr (f . concat) [[t]]
where
f [] = Nothing
f xs = Just (unzip [(y,ys) | Node y ys <- xs])
```

The performance danger here lies in `unzip`

: one could potentially optimize that for a speedup.

Another definition, in the style of `breadthFirstEnumerate`

above, is as follows:

```
levels ts = f b ts [] []
where
f k (Node x xs) ls qs = k (x : ls) (xs : qs)
b _ [] = []
b k qs = k : foldl (foldl f) b qs [] []
```

Here, we maintain a stack building up the current level, as well as a queue that we send to the next level. Because we’re consing onto the front of the stack, the subforest needs to be traversed in reverse, to build up the output list in the right order. This is why we’re using a second `foldl`

here, whereas the original had `foldr`

on the inner loop.

Looking at the implicit queue version, I noticed that it’s just using a church-encoded pair to reverse the direction of the fold. Instead of doing both reversals, we can use a normal pair, and run it in one direction:

```
levels ts = b (f ts ([],[]))
where
f (Node x xs) (ls,qs) = (x:ls,xs:qs)
b (_,[]) = []
b (k,qs) = k : b (foldr (flip (foldr f)) ([],[]) qs)
```

Secondly, we’re running a fold on the second component of the pair: why not run the fold immediately, rather than building the intermediate list. In fact, we’re running a fold over the *whole* thing, which we can do straight away:

```
levels ts = f ts []
where
f (Node x xs) (q:qs) = (x:q) : foldr f qs xs
f (Node x xs) [] = [x] : foldr f [] xs
```

After looking at it for a while, I realized it’s similar to an inlined version of the algorithm presented in Gibbons (2015):

```
levels t = [rootLabel t] : foldr (lzw (++)) [] (map levels (subForest t))
where
lzw f (x:xs) (y:ys) = f x y : lzw f xs ys
lzw _ xs [] = xs
lzw _ [] ys = ys
```

Before going any further, all of the functions so far can be redefined to work on the cofree comonad:

When `f`

is specialized to `[]`

, we get the original rose tree. So far, though, all we actually require is `Foldable`

.

From now on, then, we’ll use `Cofree`

instead of `Tree`

.

Finally, we can begin on the traversal itself. We know how to execute the effects in the right order, what’s missing is to build the tree back up in the right order.

First thing we’ll use is a trick with `Traversable`

, where we fill a container from a list. In other words:

With the state monad (or applicative, in this case, I suppose), we can define a “pop” action, which takes an element from the supply:

And then we `traverse`

that action over our container:

When we use fill, it’ll have the following type:

```
breadthFirst :: (Applicative f, Traversable t)
=> (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst = ...
where
...
fill :: t (Cofree t a) -> State [Cofree t b] (t (Cofree t b))
fill = traverse (const pop)
```

Hopefully that makes sense: we’re going to get the subforest from here:

And we’re going to fill it with the result of the traversal, which changes the contents from `a`

s to `b`

s.

One of the nice things about working with applicatives is that they compose, in a variety of different ways. In other words, if I have one effect, `f`

, and another `g`

, and I want to run them both on the contents of some list, I can do it in one pass, either by layering the effects, or putting them side-by-side.

In our case, we need to deal with two effects: the one generated by the traversal, (the one the caller wants to use), and the internal state we’re using to fill up the forests in our tree. We could use `Compose`

explicitly, but we can avoid some calls to `pure`

if we write the combinators we’re going to use directly:

```
map2
:: (Functor f, Functor g)
=> (a -> b -> c) -> f a -> g b -> f (g c)
map2 f x xs =
fmap (\y -> fmap (f y) xs) x
app2
:: (Applicative f, Applicative g)
=> (a -> b -> c -> d) -> f a -> g b -> f (g c) -> f (g d)
app2 f x xs =
liftA2 (\y -> liftA2 (f y) xs) x
```

The outer applicative (`f`

) will be the user’s effect, the inner will be `State`

.

First we’ll try convert the zippy-style `levels`

to a traversal. First, convert the function over to the cofree comonad:

```
levels tr = f tr []
where
f (x:<xs) (q:qs) = (x:q) : foldr f qs xs
f (x:<xs) [] = [x] : foldr f [] xs
```

Next, instead of building up a list of just the root labels, we’ll pair them with the subforests:

```
breadthFirst tr = f tr []
where
f (x:<xs) (q:qs) = ((x,xs):q) : foldr f qs xs
f (x:<xs) [] = [(x,xs)] : foldr f [] xs
```

Next, we’ll fill the subforests:

```
breadthFirst tr = f tr []
where
f (x:<xs) (q:qs) = ((x,fill xs):q) : foldr f qs xs
f (x:<xs) [] = [(x,fill xs)] : foldr f [] xs
```

Then, we can run the applicative effect on the root label:

```
breadthFirst c tr = f tr []
where
f (x:<xs) (q:qs) = ((c x,fill xs):q) : foldr f qs xs
f (x:<xs) [] = [(c x,fill xs)] : foldr f [] xs
```

Now, to combine the effects, we can use the combinators we defined before:

```
breadthFirst c tr = f tr []
where
f (x:<xs) (q:qs) =
app2 (\y ys zs -> (y:<ys) : zs) (c x) (fill xs) q : foldr f qs xs
f (x:<xs) [] =
map2 (\y ys -> [y:<ys]) (c x) (fill xs) : foldr f [] xs
```

This builds a list containing all of the level-wise traversals of the tree. To collapse them into one, we can use a fold:

```
breadthFirst :: (Traversable t, Applicative f)
=> (a -> f b)
-> Cofree t a
-> f (Cofree t b)
breadthFirst c tr =
head <$> foldr (liftA2 evalState) (pure []) (f tr [])
where
f (x:<xs) (q:qs) =
app2 (\y ys zs -> (y:<ys):zs) (c x) (fill xs) q : foldr f qs xs
f (x:<xs) [] =
map2 (\y ys -> [y:<ys]) (c x) (fill xs) : foldr f [] xs
```

Converting the queue-based implementation is easy once we’ve done it with the zippy one. The result is (to my eye) a little easier to read, also:

```
breadthFirst
:: (Applicative f, Traversable t)
=> (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c tr =
fmap head (f b tr e [])
where
f k (x:<xs) ls qs =
k (app2 (\y ys zs -> (y:<ys):zs) (c x) (fill xs) ls) (xs:qs)
b _ [] = pure []
b l qs = liftA2 evalState l (foldl (foldl f) b qs e [])
e = pure (pure [])
```

There are a couple things to notice here: first, we’re not using `map2`

anywhere. That’s because in the zippy version we were able to notice when the queue was exhausted, so we could just output the singleton effect. Here, instead, we’re using `pure (pure [])`

: this is potentially a source of inefficiency, as `liftA2 f (pure x) y`

is less efficient than `fmap (f x) y`

for some applicatives.

On the other hand, we don’t build up a list of levels to be combined with `foldr (liftA2 evalState)`

at any point: we combine them at every level immediately. You may be able to do the same in the zippy version, but I haven’t figured it out yet.

The final point to make here is to do with the very last thing we do in the traversal: `fmap head`

. Strictly speaking, any `fmap`

s in the code should be unnecessary: we *should* be able to fuse them all with any call to `liftA2`

. This transformation is often called the “Yoneda embedding”. We can use it here like so:

```
breadthFirst
:: ∀ t a f b. (Traversable t, Applicative f)
=> (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c tr = f (b head) tr e []
where
f k (x:<xs) ls qs =
k (app2 (\y ys zs -> (y:<ys) : zs) (c x) (fill xs) ls) (xs : qs)
b :: ∀ x. ([Cofree t b] -> x)
-> f (State [Cofree t b] [Cofree t b])
-> [t (Cofree t a)]
-> f x
b k _ [] = pure (k [])
b k l qs =
liftA2 (\x -> k . evalState x) l (foldl (foldl f) (b id) qs e [])
e = pure (pure [])
```

Notice that we need scoped type variables here, since the type of `b`

changes depending on when it’s called.

Transforming the iterative version is slightly different from the other two:

```
breadthFirst c tr = fmap head (go [tr])
where
go [] = pure []
go xs =
liftA2
evalState
(getCompose (traverse f xs))
(go (foldr (\(_:<ys) b -> foldr (:) b ys) [] xs))
f (x:<xs) = Compose (map2 (:<) (c x) (fill xs))
```

We’re using `Compose`

directly here, in contrast to the other two algorithms.

Performance-wise, no one algorithm wins out in every case. For enumeration, the zippy algorithm is the fastest in most cases—except when the tree had a large branching factor; then, the iterative algorithm wins out. For the traversals, the iterative algorithm is usually better—except for monads with more expensive applicative instances.

I’m still not convinced that the zippy traversal is as optimized as it could be, however. If anyone has a better implementation, I’d love to see it!

Using the composability of applicatives, we can fuse several operations over traversables into one pass. Unfortunately, however, this can often introduce a memory overhead that makes the whole operation slower overall. One such example is the iterative algorithm above:

```
breadthFirst c tr = fmap head (go [tr])
where
go [] = pure []
go xs = liftA2 evalState zs (go (ys []))
where
Compose (Endo ys,Compose zs) = traverse f xs
f (x :< xs) =
Compose
(Endo (flip (foldr (:)) xs)
,Compose (map2 (:<) (c x) (fill xs)))
```

We only traverse the subforest of each node once now, fusing the fill operation with building the list to send to the recursive call. This is expensive (especially memory-wise), though, and traversing the descendant is cheap; the result is that the one-pass version is slower (in my tests).

The cofree comonad allows us to generalize over the type of “descendants”—from lists (in `Tree`

) to anything traversable. We could also generalize over the type of the traversal itself: given a way to access the descendants of a node, we should be able to traverse all nodes in a breadth-first order. This kind of thing is usually accomplished by Plated: it’s a class that gives you a traversal over the immediate descendants of some recursive type. Adapting the iterative version is relatively simple:

```
breadthFirstOf :: Traversal' a a -> Traversal' a a
breadthFirstOf trav c tr = fmap head (go [tr])
where
go [] = pure []
go xs =
liftA2
evalState
(getCompose (traverse f xs))
(go (foldr (\ys b -> foldrOf trav (:) b ys) [] xs))
f xs = Compose (fmap fill (c xs))
fill = trav (const (State (\(x:xs) -> (x, xs))))
```

We can use this version to get back some of the old functions above:

```
breadthFirstEnumerate :: Traversable f => Cofree f a -> [a]
breadthFirstEnumerate = toListOf (breadthFirstOf plate . _extract)
```

Building a tree breadth-first, monadically, is still an unsolved problem (it looks like: Feuer 2015).

Using some of these we can implement a monadic breadth-first unfold for the cofree comonad:

```
unfoldM :: (Monad m, Traversable t)
=> (b -> m (a, t b))
-> b
-> m (Cofree t a)
unfoldM c tr = go head [tr]
where
go k [] = pure (k [])
go k xs = do
ys <- traverse c xs
go (k . evalState (traverse f ys)) (toList (Compose (Compose ys)))
f (x,xs) = fmap (x:<) (fill xs)
```

Allison, Lloyd. 2006. “Circular Programs and Self-Referential Structures.” *Software: Practice and Experience* 19 (2) (October 30): 99–109. doi:10.1002/spe.4380190202. http://users.monash.edu/~lloyd/tildeFP/1989SPE/.

Feuer, David. 2015. “Is a lazy, breadth-first monadic rose tree unfold possible?” Question. *Stack Overflow*. https://stackoverflow.com/q/27748526.

Gibbons, Jeremy. 2015. “Breadth-First Traversal.” *Patterns in Functional Programming*. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Smith, Leon P. 2009. “Lloyd Allison’s Corecursive Queues: Why Continuations Matter.” *The Monad.Reader*, July 29. https://meldingmonads.files.wordpress.com/2009/06/corecqueues.pdf.

Part 2 of a 5-part series on Breadth-First Traversals

Tags: Haskell

I was looking again at the issue of writing breadth-first traversals for rose trees, and in particular the problem explored in Gibbons (2015). The breadth-first traversal here is a traversal in the lensy sense.

First, let’s look back at getting the levels out of the tree. Here’s the old function I arrived at last time:

```
levels :: Forest a -> [[a]]
levels ts = foldl f b ts [] []
where
f k (Node x xs) ls qs = k (x : ls) (xs : qs)
b _ [] = []
b k qs = k : foldl (foldl f) b qs [] []
```

After wrangling the definition a little, I got to the following (much cleaner) definition:

```
levels :: Tree a -> [[a]]
levels tr = f tr [] where
f (Node x xs) (y:ys) = (x:y) : foldr f ys xs
f (Node x xs) [] = [x] : foldr f [] xs
```

Before going any further, all of the functions so far can be redefined to work on the cofree comonad:

When `f`

is specialized to `[]`

, we get the original rose tree. But what we actually require is much less specific: `levels`

, for instance, only needs `Foldable`

.

```
levelsCofree :: Foldable f => Cofree f a -> [[a]]
levelsCofree tr = f tr []
where
f (x:<xs) (y:ys) = (x:y) : foldr f ys xs
f (x:<xs) [] = [x] : foldr f [] xs
```

Using this, we can write the efficient breadth-first traversal:

```
breadthFirst
:: (Applicative f, Traversable t)
=> (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c (t:<ts) =
liftA2 evalState (map2 (:<) (c t) (fill ts)) chld
where
chld = foldr (liftA2 evalState) (pure []) (foldr f [] ts)
fill = traverse (const (state (\(x:xs) -> (x,xs))))
f (x:<xs) (q:qs)
= app2 (\y ys zs -> (y:<ys) : zs) (c x) (fill xs) q
: foldr f qs xs
f (x:<xs) []
= map2 (\y ys -> [y:<ys]) (c x) (fill xs)
: foldr f [] xs
map2 k x xs = fmap (\y -> fmap (k y) xs) x
app2 k x xs = liftA2 (\y -> liftA2 (k y) xs) x
```

At every level, the subforest’s shape it taken (`fill`

), and it’s traversed recursively. We can fuse these two steps into one:

```
breadthFirst
:: (Traversable t, Applicative f)
=> (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c (t:<ts) =
liftA2
evalState
(map2 (:<) (c t) fill)
(foldr (liftA2 evalState) (pure []) (chld []))
where
Compose (Endo chld,fill) = go ts
go = traverse (\x -> Compose (Endo (f x), state (\(y:ys) -> (y,ys))))
f (x:<xs) (q:qs) = app2 (\y ys zs -> (y:<ys) : zs) (c x) r q : rs qs
where Compose (Endo rs,r) = go xs
f (x:<xs) [] = map2 (\y ys -> [y:<ys]) (c x) r : rs []
where Compose (Endo rs,r) = go xs
map2 k x xs = fmap (\y -> fmap (k y) xs) x
app2 k x xs = liftA2 (\y -> liftA2 (k y) xs) x
```

The overhead from this approach scraps any benefit, though.

*Patterns in Functional Programming*. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Part 1 of a 1-part series on Sorting

Tags: Haskell, Algorithms

I was working on some performance-intensive stuff recently, and I ran into the issue of sorting very small amounts of values (think 3, 4, 5).

The standard way to do this is with sorting networks. The way I’ll be using doesn’t actually perform any parallelism (unfortunately), but it is a clean way to write the networks in Haskell without too much repetition.

This website will generate an optimal sorting network for your given size, and the output (for 3) looks like this:

```
[[1,2]]
[[0,2]]
[[0,1]]
```

Each pair of indices represents a “compare-and-swap” operation: so the first line means “compare the value at 1 to the value at 2: if it’s bigger, swap them”. For 5, the network looks like this:

```
[[0,1],[2,3]]
[[0,2],[1,3]]
[[1,2],[0,4]]
[[1,4]]
[[2,4]]
[[3,4]]
```

Pairs on the same line can be performed in parallel.

For our case, I’m going to be looking at sorting tuples, but the technique can easily be generalized to vectors, etc.

The first trick is to figure out how to do “swapping”: we don’t want mutation, so what we can do instead is swap the *reference* to some value, by shadowing its name. In other words:

```
swap2 :: (a -> a -> Bool) -> a -> a -> (a, a)
swap2 lte x y | lte x y = (x, y)
| otherwise = (y, x)
sort3 :: (a -> a -> Bool) -> (a,a,a) -> (a,a,a)
sort3 lte (_0,_1,_2)
= case swap2 lte _1 _2 of
(_1, _2) -> case swap2 lte _0 _2 of
(_0, _2) -> case swap2 lte _0 _1 of
(_0, _1) -> (_0, _1, _2)
```

The indentation is hard to read, though, and wrapping-and-unwrapping tuples makes me nervous about the performance (although it may be inlined). The next step is to *church-encode* the pairs returned:

```
swap2 :: (a -> a -> Bool) -> a -> a -> (a -> a -> b) -> b
swap2 lte x y k
| lte x y = k x y
| otherwise = k y x
sort3 :: (a -> a -> Bool) -> (a,a,a) -> (a,a,a)
sort3 lte (_0,_1,_2)
= swap2 lte _1 _2 $ \ _1 _2 ->
swap2 lte _0 _2 $ \ _0 _2 ->
swap2 lte _0 _1 $ \ _0 _1 ->
(_0,_1,_2)
```

Then, to get this to compile down to efficient code, we can make judicious use of `inline`

from GHC.Exts:

```
import GHC.Exts (inline)
swap2 :: (a -> a -> Bool) -> a -> a -> (a -> a -> b) -> b
swap2 lte x y k
| inline lte x y = inline k x y
| otherwise = inline k y x
{-# INLINE swap2 #-}
sort3 :: (a -> a -> Bool) -> (a, a, a) -> (a, a, a)
sort3 lte (_0,_1,_2)
= swap2 lte _1 _2 $ \ _1 _2 ->
swap2 lte _0 _2 $ \ _0 _2 ->
swap2 lte _0 _1 $ \ _0 _1 ->
(_0,_1,_2)
{-# INLINE sort3 #-}
```

And to see if this really does make efficient code, let’s look at the core (cleaned up):

```
sort3
= \ (lte :: a -> a -> Bool)
(ds :: (a, a, a)) ->
case ds of wild_X8 (_0, _1, _2) ->
case lte _1 _2 of
False ->
case lte _0 _1 of
False -> (_2, _1, _0)
True ->
case lte _0 _2 of
False -> (_2, _0, _1)
True -> (_0, _2, _1)
True ->
case lte _0 _2 of
False ->
case lte _2 _1 of
False -> (_1, _2, _0)
True -> (_2, _1, _0)
True ->
case lte _0 _1 of
False -> (_1, _0, _2)
True -> wild_X8
```

Fantastic! When we specialize to `Int`

, we get all of the proper unpacking:

Core (with just the variable names cleaned up this time):

```
sort3Int
= \ (w :: (Int, Int, Int)) ->
case w of w_X { (_0, _1, _2) ->
case _0 of w_0 { GHC.Types.I# _0U ->
case _1 of w_1 { GHC.Types.I# _1U ->
case _2 of w_2 { GHC.Types.I# _2U ->
case GHC.Prim.<=# _1U _2U of {
__DEFAULT ->
case GHC.Prim.<=# _0U _1U of {
__DEFAULT -> (w_2, w_1, w_0);
1# ->
case GHC.Prim.<=# _0U _2U of {
__DEFAULT -> (w_2, w_0, w_1);
1# -> (w_0, w_2, w_1)
}
};
1# ->
case GHC.Prim.<=# _0U _2U of {
__DEFAULT ->
case GHC.Prim.<=# _2U _1U of {
__DEFAULT -> (w_1, w_2, w_0);
1# -> (w_2, w_1, w_0)
};
1# ->
case GHC.Prim.<=# _0U _1U of {
__DEFAULT -> (w_1, w_0, w_2);
1# -> w_X
}
}
}
}
}
}
}
```

Now, for the real test: sorting 5-tuples, using the network above.

```
sort5 :: (a -> a -> Bool) -> (a,a,a,a,a) -> (a,a,a,a,a)
sort5 lte (_0,_1,_2,_3,_4)
= swap2 lte _0 _1 $ \ _0 _1 ->
swap2 lte _2 _3 $ \ _2 _3 ->
swap2 lte _0 _2 $ \ _0 _2 ->
swap2 lte _1 _3 $ \ _1 _3 ->
swap2 lte _1 _2 $ \ _1 _2 ->
swap2 lte _0 _4 $ \ _0 _4 ->
swap2 lte _1 _4 $ \ _1 _4 ->
swap2 lte _2 _4 $ \ _2 _4 ->
swap2 lte _3 _4 $ \ _3 _4 ->
(_0,_1,_2,_3,_4)
{-# INLINE sort5 #-}
```

The core output from this is over 1000 lines long: you can see it (with the variable names cleaned up) here.

In my benchmarks, these functions are indeed quicker than their counterparts in vector, but I’m not confident in my knowledge of Haskell performance to make much of a strong statement about them.

]]>
Tags: Haskell, Dependent Types

The code from this post is available as a gist.

One of the most basic tools for use in type-level programming is the Peano definition of the natural numbers:

Using the new `TypeFamilyDependencies`

extension, these numbers can be used to describe the “size” of some type. I’m going to use the proportion symbol here:

Using this type family we can describe induction on the natural numbers:

```
class Finite n where
induction ∷ t ∝ Z → (∀ k. t ∝ k → t ∝ S k) → t ∝ n
instance Finite Z where
induction z _ = z
{-# inline induction #-}
instance Finite n ⇒ Finite (S n) where
induction z s = s (induction z s)
{-# inline induction #-}
```

The `induction`

function reads as the standard mathematical definition of induction: given a proof (value) of the zero case, and a proof that any proof is true for its successor, we can give you a proof of the case for any finite number.

An added bonus here is that the size of something can usually be resolved at compile-time, so any inductive function on it should also be resolved at compile time.

We can use it to provide the standard instances for basic length-indexed lists:

Some instances for those lists are easy:

However, for `Applicative`

, we need some way to recurse on the size of the list. This is where induction comes in.

This lets us write `pure`

in a pleasingly simple way:

But can we also write `<*>`

using induction? Yes! Because we’ve factored out the induction itself, we just need to describe the notion of a “sized” function:

Then we can write `<*>`

as so:

```
instance Finite n ⇒
Applicative (List n) where
pure x = induction Nil (x :-)
(<*>) =
induction
(\Nil Nil → Nil)
(\k (f :- fs) (x :- xs) → f x :- k fs xs)
```

What about the `Monad`

instance? For that, we need a little bit of plumbing: the type signature of `>>=`

is:

One of the parameters (the second `a`

) doesn’t have a size: we’ll need to work around that, with `Const`

:

Using this, we can write our `Monad`

instance:

```
head' ∷ List (S n) a → a
head' (x :- _) = x
tail' ∷ List (S n) a → List n a
tail' (_ :- xs) = xs
instance Finite n ⇒
Monad (List n) where
xs >>= (f ∷ a → List n b) =
induction
(\Nil _ → Nil)
(\k (y :- ys) fn → head' (fn (Const y)) :-
k ys (tail' . fn . Const . getConst))
xs
(f . getConst ∷ Const a n → List n b)
```

Getting the above to work actually took a surprising amount of work: the crux is that the `∝`

type family needs to be injective, so the “successor” proof can typecheck. Unfortunately, this means that every type can only have one notion of “size”. What I’d prefer is to be able to pass in a function indicating exactly *how* to get the size out of a type, that could change depending on the situation. So we could recurse on the first argument of a function, for instance, or just its second, or just the result. This would need either type-level lambdas (which would be cool), or generalized type family dependencies.

Tags: Haskell, Pattern Synonyms

Pattern Synonyms is an excellent extension for Haskell. There are some very cool examples of their use out there, and I thought I’d add to the list.

Lists are *the* fundamental data structure for functional programmers. Unfortunately, once more specialized structures are required, you often have to switch over to an uncomfortable, annoying API which isn’t as pleasant or fun to use as cons and nil. With pattern synonyms, though, that’s not so! For instance, here’s how you would do it with a run-length-encoded list:

```
data List a
= Nil
| ConsN {-# UNPACK #-} !Int
a
(List a)
cons :: Eq a => a -> List a -> List a
cons x (ConsN i y ys)
| x == y = ConsN (i+1) y ys
cons x xs = ConsN 1 x xs
uncons :: List a -> Maybe (a, List a)
uncons Nil = Nothing
uncons (ConsN 1 x xs) = Just (x, xs)
uncons (ConsN n x xs) = Just (x, ConsN (n-1) x xs)
infixr 5 :-
pattern (:-) :: Eq a => a -> List a -> List a
pattern x :- xs <- (uncons -> Just (x, xs))
where
x :- xs = cons x xs
{-# COMPLETE Nil, (:-) #-}
zip :: List a -> List b -> List (a,b)
zip (x :- xs) (y :- ys) = (x,y) :- zip xs ys
zip _ _ = Nil
```

A little more useful would be to do the same with a heap:

```
data Tree a
= Leaf
| Node a (Tree a) (Tree a)
smerge :: Ord a => Tree a -> Tree a -> Tree a
smerge Leaf ys = ys
smerge xs Leaf = xs
smerge h1@(Node x lx rx) h2@(Node y ly ry)
| x <= y = Node x (smerge h2 rx) lx
| otherwise = Node y (smerge h1 ry) ly
cons :: Ord a => a -> Tree a -> Tree a
cons x = smerge (Node x Leaf Leaf)
uncons :: Ord a => Tree a -> Maybe (a, Tree a)
uncons Leaf = Nothing
uncons (Node x l r) = Just (x, smerge l r)
infixr 5 :-
pattern (:-) :: Ord a => a -> Tree a -> Tree a
pattern x :- xs <- (uncons -> Just (x, xs))
where
x :- xs = cons x xs
{-# COMPLETE Leaf, (:-) #-}
sort :: Ord a => [a] -> [a]
sort = go . foldr (:-) Leaf
where
go Leaf = []
go (x :- xs) = x : go xs
```

In fact, this pattern can be generalized, so *any* container-like-thing with a cons-like-thing can be modified as you would with lists. You can see the generalization in lens.

One of the most confusing things I remember about learning Haskell early-on was that the vast majority of the Monads examples didn’t work, because they were written pre-transformers. In other words, the state monad was defined like so:

But in transformers nowadays (which is where you get `State`

from if you import it in the normal way), the definition is:

This results in some *very* confusing error messages when you try run example code.

However, we can pretend that the change never happened, with a simple pattern synonym:

```
newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }
type State s = StateT s Identity
runState :: State s a -> s -> (a, s)
runState xs = runIdentity . runStateT xs
pattern State :: (s -> (a, s)) -> State s a
pattern State x <- (runState -> x)
where
State x = StateT (Identity . x)
```

If you want to write type-level proofs on numbers, you’ll probably end up using Peano numerals and singletons:

```
data Nat = Z | S Nat
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
type family (+) (n :: Nat) (m :: Nat) :: Nat where
Z + m = m
S n + m = S (n + m)
plusZeroIsZero :: Natty n -> n + Z :~: n
plusZeroIsZero Zy = Refl
plusZeroIsZero (Sy n) = case plusZeroIsZero n of
Refl -> Refl
```

Pretty cool, right? We can even erase the proof (if we really trust it) using rewrite rules:

This isn’t *ideal*, but it’s getting there.

However, if we ever want to use these things at runtime (perhaps as a type-level indication of some data structure’s size), we’re going to rely on the value-level Peano addition, which is bad news.

Not so with pattern synonyms!

```
data family The k :: k -> Type
class Sing (a :: k) where sing :: The k (a :: k)
data Nat = Z | S Nat
newtype instance The Nat n = NatSing Natural
instance Sing Z where
sing = NatSing 0
instance Sing n => Sing (S n) where
sing =
(coerce :: (Natural -> Natural) -> (The Nat n -> The Nat (S n)))
succ sing
data Natty n where
ZZy :: Natty Z
SSy :: The Nat n -> Natty (S n)
getNatty :: The Nat n -> Natty n
getNatty (NatSing n :: The Nat n) = case n of
0 -> gcastWith (unsafeCoerce Refl :: n :~: Z) ZZy
_ -> gcastWith (unsafeCoerce Refl :: n :~: S m) (SSy (NatSing (pred n)))
pattern Zy :: () => (n ~ Z) => The Nat n
pattern Zy <- (getNatty -> ZZy) where Zy = NatSing 0
pattern Sy :: () => (n ~ S m) => The Nat m -> The Nat n
pattern Sy x <- (getNatty -> SSy x) where Sy (NatSing x) = NatSing (succ x)
{-# COMPLETE Zy, Sy #-}
type family (+) (n :: Nat) (m :: Nat) :: Nat where
Z + m = m
S n + m = S (n + m)
-- | Efficient addition, with type-level proof.
add :: The Nat n -> The Nat m -> The Nat (n + m)
add = (coerce :: (Natural -> Natural -> Natural)
-> The Nat n -> The Nat m -> The Nat (n + m)) (+)
-- | Proof on efficient representation.
addZeroRight :: The Nat n -> n + Z :~: n
addZeroRight Zy = Refl
addZeroRight (Sy n) = gcastWith (addZeroRight n) Refl
```

(unfortunately, incomplete pattern warnings don’t work here)

So you’ve got a tree type:

And you’ve spent some time writing a (reasonably difficult) function on the tree:

```
showTree :: Show a => Tree a -> String
showTree Tip = ""
showTree (Bin x' ls' rs') = go True id xlen' ls'
$ showString xshw'
$ endc ls' rs'
$ showChar '\n'
$ go False id xlen' rs' ""
where
xshw' = show x'
xlen' = length xshw'
go _ _ _ Tip = id
go up k i (Bin x ls rs) = branch True ls
. k
. pad i
. showChar (bool '└' '┌' up)
. showString xshw
. endc ls rs
. showChar '\n'
. branch False rs
where
xshw = show x
xlen = length xshw
branch d
| d == up = go d (k . pad i) (xlen + 1)
| otherwise = go d (k . pad i . showChar '│') xlen
endc Tip Tip = id
endc Bin {} Tip = showChar '┘'
endc Tip Bin {} = showChar '┐'
endc Bin {} Bin {} = showChar '┤'
pad = (++) . flip replicate ' '
```

But, for some reason or another, you need to add a field to your `Bin`

constructor, to store the size of the subtree (for instance). Does this function have to change? No! Simply change the tree definition as so:

```
data Tree a
= Tip
| Bin' Int a (Tree a) (Tree a)
pattern Bin x ls rs <- Bin' n x ls rs
{-# COMPLETE Tip, Bin #-}
```

And all the old code works!

This gets to the core of pattern synonyms: it’s another tool which we can use to separate implementation from API.

Say you’ve got a data type that has certain constraints on what values it can hold. You’re not writing a paper for ICFP, so expressing those constraints as a beautiful type isn’t required: you just want to only export the constructor and accessors, and write some tests to make sure that those functions always obey the constraints.

But once you do this you’ve lost something: pattern-matching. Let’s get it back with pattern synonyms!

As our simple example, our constraint is going to be “A list where the values are always ordered”:

]]>
Tags: Haskell, Algorithms

There’s a popular UK TV show called Countdown with a round where contestants have to get as close to some target number as possible by constructing an arithmetic expression from six random numbers.

You don’t have to use all of the numbers, and you’re allowed use four operations: addition, subtraction, multiplication, and division. Additionally, each stage of the calculation must result in a positive integer.

Here’s an example. Try get to the target 586:

$100,25,1,5,3,10$

On the show, contestants get 30 seconds to think of an answer.

Solving it in Haskell was first explored in depth in Hutton (2002). There, a basic “generate-and-test” implementation was provided and proven correct.

As an optimization problem, there are several factors which will influence the choice of algorithm:

- There’s no obvious heuristic for constructing subexpressions in order to get to a final result. In other words, if we have $25 * 3 + 10$ and $25 * 3 * 10$, there’s no easy way to tell which is “closer” to $586$. The latter is closer numerically, but the former is what we ended up using in the solution.
- Because certain subexpressions aren’t allowed, we’ll be able to prune the search space as we go.
- Ideally, we’d only want to calculate each possible subexpression once, making it a pretty standard dynamic programming problem.

I’ll be focusing on the third point in this post, but we can add the second point in at the end. First, however, let’s write a naive implementation.

I can’t think of a simpler way to solve the problem than generate-and-test, so we’ll work from there. Testing is easy (`(target ==) . eval`

), so we’ll focus on generation. The core function we’ll use for this is usually called “unmerges”:

```
unmerges [x,y] = [([x],[y])]
unmerges (x:xs) =
([x],xs) :
concat
[ [(x:ys,zs),(ys,x:zs)]
| (ys,zs) <- unmerges xs ]
unmerges _ = []
```

It generates all possible 2-partitions of a list, ignoring order:

I haven’t looked much into how to optimize this function or make it nicer, as we’ll be swapping it out later.

Next, we need to make the recursive calls:

```
allExprs :: (a -> a -> [a]) -> [a] -> [a]
allExprs _ [x] = [x]
allExprs c xs =
[ e
| (ys,zs) <- unmerges xs
, y <- allExprs c ys
, z <- allExprs c zs
, e <- c y z ]
```

Finally, using the simple-reflect library, we can take a look at the output:

```
>>> allExprs (\x y -> [x+y,x*y]) [1,2] :: [Expr]
[1 + 2,1 * 2]
>>> allExprs (\x y -> [x+y]) [1,2,3] :: [Expr]
[1 + (2 + 3),1 + 2 + 3,2 + (1 + 3)]
```

Even at this early stage, we can actually already write a rudimentary solution:

```
countdown :: [Integer] -> Integer -> [Expr]
countdown xs targ =
filter
((==) targ . toInteger)
(allExprs
(\x y -> [x,y,x+y,x*y])
(map fromInteger xs))
>>> mapM_ print (countdown [100,25,1,5,3,10] 586)
1 + (100 * 5 + (25 * 3 + 10))
1 + (100 * 5 + 25 * 3 + 10)
1 + (25 * 3 + (100 * 5 + 10))
1 + 100 * 5 + (25 * 3 + 10)
100 * 5 + (1 + (25 * 3 + 10))
100 * 5 + (1 + 25 * 3 + 10)
100 * 5 + (25 * 3 + (1 + 10))
1 + (100 * 5 + 25 * 3) + 10
1 + 100 * 5 + 25 * 3 + 10
100 * 5 + (1 + 25 * 3) + 10
100 * 5 + 25 * 3 + (1 + 10)
1 + 25 * 3 + (100 * 5 + 10)
25 * 3 + (1 + (100 * 5 + 10))
25 * 3 + (1 + 100 * 5 + 10)
25 * 3 + (100 * 5 + (1 + 10))
```

As you can see from the output, there’s a lot of repetition. We’ll need to do some memoization to speed it up.

The normal way most programmers think about “memoization” is something like this:

```
memo_dict = {0:0,1:1}
def fib(n):
if n in memo_dict:
return memo_dict[n]
else:
res = fib(n-1) + fib(n-2)
memo_dict[n] = res
return res
```

In other words, it’s a fundamentally stateful process. We need to mutate some mapping when we haven’t seen the argument before.

Using laziness, though, we can emulate the same behavior purely. Instead of mutating the mapping on function calls, we fill the whole thing at the beginning, and then index into it. As long as the mapping is lazy, it’ll only evaluate the function calls when they’re needed. We could use lists as our mapping to the natural numbers:

The benefit here is that we avoid the extra work of redundant calls. However, we pay for the speedup in three ways:

- Space: we need to take up memory space storing the cached solutions.
- Indexing: while we no longer have to pay for the expensive recursive calls, we
*do*now have to pay for indexing into the data structure. In this example, we’re paying linear time to index into the list. - Generality: the memoization is tied directly to the argument type to the function. We need to be able to use the argument to our memoized function as an index into some data structure. While a lot of argument types admit some type of indexing (whether they’re
`Hashable`

,`Ord`

, etc.), some don’t, and we can’t memoize those using this technique.

We’re going to look at a technique that allow us to somewhat mitigate 2 and 3 above, using something called a *nexus*.

The standard technique of memoization is focused on the arguments to the function, creating a concrete representation of them in memory to map to the results. Using nexuses, as described in Bird and Hinze (2003), we’ll instead focus on the function itself, creating a concrete representation of its call graph in memory. Here’s the call graph of Fibonacci:

```
┌fib(1)=1
┌fib(2)=1┤
│ └fib(0)=0
┌fib(3)=2┤
│ └fib(1)=1
┌fib(4)=3┤
│ │ ┌fib(1)=1
│ └fib(2)=1┤
│ └fib(0)=0
┌fib(5)=5┤
│ │ ┌fib(1)=1
│ │ ┌fib(2)=1┤
│ │ │ └fib(0)=0
│ └fib(3)=2┤
│ └fib(1)=1
fib(6)=8┤
│ ┌fib(1)=1
│ ┌fib(2)=1┤
│ │ └fib(0)=0
│ ┌fib(3)=2┤
│ │ └fib(1)=1
└fib(4)=3┤
│ ┌fib(1)=1
└fib(2)=1┤
└fib(0)=0
```

Turning *that* into a concrete datatype wouldn’t do us much good: it still has the massively redundant computations in it. However, we can recognize that entire subtrees are duplicates of each other: in those cases, instead of creating both subtrees, we could just create one and have each parent point to it^{1}:

```
┌fib(5)=5┬────────┬fib(3)=2┬────────┬fib(1)=1
fib(6)=8┤ │ │ │ │
└────────┴fib(4)=3┴────────┴fib(2)=1┴fib(0)=0
```

This is a nexus. In Haskell, it’s not observably different from the other form, except that it takes up significantly less space. It’s also much quicker to construct.

If we use it to memoize `fib`

, we’ll no longer be indexing on the argument: we’ll instead follow the relevant branch in the tree to the subcomputation, which is just chasing a pointer. It also means the argument doesn’t have to be constrained to any specific type. Here’s how you’d do it:

```
data Tree
= Leaf
| Node
{ val :: Integer
, left :: Tree
, right :: Tree}
fib :: Integer -> Integer
fib = val . go
where
go 0 = Node 0 Leaf Leaf
go 1 = Node 1 (Node 0 Leaf Leaf) Leaf
go n = node t (left t) where t = go (n-1)
node l r = Node (val l + val r) l r
```

So this approach sounds amazing, right? No constraints on the argument type, no need to pay for indexing: why doesn’t everyone use it everywhere? The main reason is that figuring out a nexus for the call-graph is *hard*. In fact, finding an optimal one is NP-hard in general (Steffen and Giegerich 2006).

The second problem is that it’s difficult to abstract out. The standard technique of memoization relies on building a mapping from keys to values: about as bread-and-butter as it gets in programming. Even more, we already know how to say “values of this type can be used efficiently as keys in some mapping”: for Data.Map it’s `Ord`

, for Data.HashMap it’s `Hashable`

. All of this together means we can build a nice library for memoization which exports the two following functions:

Building a nexus, however, is not bread-and-butter. On top of that, it’s difficult to say something like “recursive functions of this structure can be constructed using a nexus”. What’s the typeclass for that? In comparison to the signatures above, the constraint will need to be on the *arrows*, not the `a`

. Even talking about the structure of recursive functions is regarded as somewhat of an advanced subject: that said, the recursion-schemes package allows us to do so, and even has facilities for constructing something *like* nexuses with histomorphisms (Tobin 2016). I’m still looking to see if there’s a library out there that *does* manage to abstract nexuses in an ergonomic way, so I’d love to hear if there was one (or if there’s some more generalized form which accomplishes the same).

That’s enough preamble. The nexus we want to construct for countdown is *not* going to memoize as much as possible: in particular, we’re only going to memoize the shape of the trees, not the operators used. This will massively reduce the memory overhead, and still give a decent speedup (Bird and Mu 2005, 11 “building a skeleton tree first”).

With that in mind, the ideal nexus looks something like this:

We can represent the tree in Haskell as a rose tree:

Constructing the nexus itself isn’t actually the most interesting part of this solution: *consuming* it is. We need to be able to go from the structure above into a list that’s the equivalent of `unmerges`

. Doing a breadth-first traversal of the diagram above (without the top element) will give us:

$abc, abd, acd, bcd, ab, ac, bc, ad, bd, cd, a, b, c, d$

If you split that list in half, and zip it with its reverse, you’ll get the output of `unmerges`

.

However, the breadth-first traversal of the diagram isn’t the same thing as the breadth-first traversal of the rose tree. The latter will traverse $abc, abd, acd, bcd$, and then the children of $abc$ ($ab,ac,bc$), and then the children of $abd$ ($ab,ad,bd$): and here’s our problem. We traverse $ab$ twice, because we can’t know that $abc$ and $abd$ are pointing to the same value. What we have to do is first prune the tree, removing duplicates, and then perform a breadth-first traversal on that.

Luckily, the duplicates follow a pattern, allowing us to remove them without having to do any equality checking. In each row, the first node has no duplicates in its children, the second’s first child is a duplicate, the third’s first and second children are duplicates, and so on. You should be able to see this in the diagram above. Adapting a little from the paper, we get an algorithm like this:

```
para :: (a -> [a] -> b -> b) -> b -> [a] -> b
para f b = go
where
go [] = b
go (x:xs) = f x xs (go xs)
prune :: Forest a -> Forest a
prune ts = pruneAt ts 0
where
pruneAt = para f (const [])
f (Node x []) t _ _ = Node x [] : t
f (Node x us) _ a k =
Node x (pruneAt (drop k us) k) : a (k + 1)
```

I went through this in a previous post, so this is the end solution:

```
breadthFirst :: Forest a -> [a]
breadthFirst ts = foldr f b ts []
where
f (Node x xs) fw bw = x : fw (xs:bw)
b [] = []
b q = foldl (foldr f) b q []
```

With the appropriate incantations, this is actually the fastest implementation I’ve found.

We can actually inline both of the above functions, fusing them together:

```
spanNexus :: Forest a -> [a]
spanNexus ts = foldr f (const b) ts 0 []
where
f (Node x us) fw k bw = x : fw (k+1) ((drop k us, k) : bw)
b [] = []
b qs = foldl (uncurry . foldr f . const) b qs []
```

So, now we can go from the tree to our list of splits. Next step is to convert that list into the output of unmerges, by zipping the reverse of the first half with the second. We can use an algorithm described in Danvy and Goldberg (2005) to do the zipping and reversing:

```
fold xs n = go xs n (const [])
where
go xs 0 k = k xs
go (x:xs) n k = go xs (n-2) (\(y:ys) -> (x,y) : k ys)
```

And we can inline the function which collapses those results into one:

```
fold xs n = go xs n (const [])
where
go 0 xss k = k xss
go n (xs:xss) k =
go (n-2) xss (\(ys:yss) -> [ z
| x <- xs
, y <- ys
, z <- cmb x y
] ++ k yss)
```

And that’s all we need!

```
import qualified Data.Tree as Rose
data Tree a
= Leaf Int a
| Node [Tree a]
deriving (Show,Eq,Functor)
enumerateTrees :: (a -> a -> [a]) -> [a] -> [a]
enumerateTrees _ [] = []
enumerateTrees cmb xs = (extract . steps . initial) xs
where
step = map nodes . group
steps [x] = x
steps xs = steps (step xs)
initial = map (Leaf 1 . flip Rose.Node [] . pure)
extract (Leaf _ x) = Rose.rootLabel x
extract (Node [x]) = extract x
group [_] = []
group (Leaf _ x:vs) = Node [Leaf 2 [x, y] | Leaf _ y <- vs] : group vs
group (Node u:vs) = Node (zipWith comb (group u) vs) : group vs
comb (Leaf n xs) (Leaf _ x) = Leaf (n + 1) (xs ++ [x])
comb (Node us) (Node vs) = Node (zipWith comb us vs)
forest ts = foldr f (const b) ts 0 []
where
f (Rose.Node x []) fw !k bw = x : fw (k + 1) bw
f (Rose.Node x us) fw !k bw = x : fw (k + 1) ((drop k us, k) : bw)
b [] = []
b qs = foldl (uncurry . foldr f . const) b qs []
nodes (Leaf n x) = Leaf 1 (node n x)
nodes (Node xs) = Node (map nodes xs)
node n ts = Rose.Node (walk (2 ^ n - 2) (forest ts) (const [])) ts
where
walk 0 xss k = k xss
walk n (xs:xss) k =
walk (n-2) xss (\(ys:yss) -> [ z
| x <- xs
, y <- ys
, z <- cmb x y
] ++ k yss)
```

The first thing to do for the Countdown solution is to figure out a representation for expressions. The one from simple-reflect is perfect for displaying the result, but we should memoize its calculation.

Then, some helpers for building:

```
data Op = Add | Dif | Mul | Div
binOp f g x y = Memoed ((f `on` expr) x y) ((g `on` result) x y)
apply :: Op -> Memoed -> Memoed -> Memoed
apply Add x y = binOp (+) (+) x y
apply Dif x y
| result y < result x = binOp (-) (-) x y
| otherwise = binOp (-) (-) y x
apply Mul x y = binOp (*) (*) x y
apply Div x y = binOp div div x y
```

Finally, the full algorithm:

```
enumerateExprs :: [Int] -> [Memoed]
enumerateExprs = enumerateTrees cmb . map (\x -> Memoed (fromIntegral x) x)
where
cmb x y =
nubs $
x :
y :
[ apply op x y
| op <- [Add, Dif, Mul, Div]
, legal op (result x) (result y) ]
legal Add _ _ = True
legal Dif x y = x /= y
legal Mul _ _ = True
legal Div x y = x `mod` y == 0
nubs xs = foldr f (const []) xs IntSet.empty
where
f e a s
| IntSet.member (result e) s = a s
| otherwise = e : a (IntSet.insert (result e) s)
countdown :: Int -> [Int] -> [Expr]
countdown targ = map expr . filter ((==) targ . result) . enumerateExprs
>>> (mapM_ print . reduction . head) (countdown 586 [100,25,1,5,3,10])
25 * 3 + 1 + (100 * 5 + 10)
75 + 1 + (100 * 5 + 10)
76 + (100 * 5 + 10)
76 + (500 + 10)
76 + 510
586
```

There are some optimizations going on here, taken mainly from Bird and Mu (2005):

- We filter out illegal operations, as described originally.
- We filter out any expressions that have the same value.

So we’ve followed the paper, written the code: time to test. The specification of the function is relatively simple: calculate all applications of the commutative operator to some input, *without* recalculating subtrees.

We’ll need a free structure for the “commutative operator”:

Here’s the problem: it’s not commutative! We can remedy it by only exporting a constructor that creates the tree in a commutative way, and we can make it a pattern synonym so it looks normal:

```
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE PatternSynonyms #-}
module Commutative
(Tree(Leaf)
,pattern (:*:))
where
data Tree a
= Leaf a
| Tree a :^: Tree a
deriving (Eq,Ord,Show,Foldable)
pattern (:*:) :: Ord a => Tree a -> Tree a -> Tree a
pattern xs :*: ys <- xs :^: ys where
xs :*: ys
| xs <= ys = xs :^: ys
| otherwise = ys :^: xs
{-# COMPLETE Leaf, (:*:) #-}
```

Now we need to check if all applications are actually tested. First, to generate all trees:

```
allTrees :: Ord a => [a] -> Set (Tree a)
allTrees [x] = Set.singleton (Leaf x)
allTrees xs = Set.unions (map (uncurry f) (unmerges xs))
where
f ls rs = Set.fromList ((liftA2 (:*:) `on` (Set.toList . allTrees)) ls rs)
allSubTrees :: Ord a => [a] -> Set (Tree a)
allSubTrees [x] = Set.singleton (Leaf x)
allSubTrees xs =
Set.unions (map (uncurry f . (allSubTrees *** allSubTrees)) (unmerges xs))
where
f ls rs =
Set.unions
[ls, rs, Set.fromList ((liftA2 (:*:) `on` Set.toList) ls rs)]
```

Then, to test:

```
prop_exhaustiveSearch :: Natural -> Bool
prop_exhaustiveSearch n =
let src = [0 .. fromIntegral n]
expect = allSubTrees src
actual =
Set.fromList
(enumerateTrees
(\xs ys ->
[xs, ys, xs :*: ys])
(map Leaf src))
in expect == actual
prop_exhaustiveSearchFull :: Natural -> Bool
prop_exhaustiveSearchFull n =
let src = [0 .. fromIntegral n]
expect = Map.fromSet (const 1) (allTrees src)
actual =
freqs
(enumerateTrees
(\xs ys -> [xs :*: ys])
(map Leaf src))
in expect == actual
```

Testing for repeated calls is more tricky. Remember, the memoization is supposed to be unobservable: in order to see it, we’re going to have to use some unsafe operations.

```
traceSubsequences
:: ((Tree Int -> Tree Int -> [Tree Int]) -> [Tree Int] -> [Tree Int])
-> [Int]
-> (Map (Tree Int) Int, [Tree Int])
traceSubsequences enm ints =
runST $
do ref <- newSTRef Map.empty
let res = enm (combine ref) (map (conv ref) ints)
traverse_ (foldr seq (pure ())) res
intm <- readSTRef ref
pure (intm, res)
where
combine ref xs ys = unsafeRunST ([xs :*: ys] <$ modifySTRef' ref (incr (xs :*: ys)))
{-# NOINLINE combine #-}
conv ref x = unsafeRunST (Leaf x <$ modifySTRef' ref (incr (Leaf x)))
{-# NOINLINE conv #-}
unsafeRunST cmp = unsafePerformIO (unsafeSTToIO cmp)
prop_noRepeatedCalls :: Property
prop_noRepeatedCalls =
property $ sized $
\n ->
pure $
let src = [0 .. n]
(tint,tres) = fmap freqs (traceSubsequences enumerateTrees src)
(fint,fres) = fmap freqs (traceSubsequences dummyEnumerate src)
in counterexample
(mapCompare (freqs (allSubTrees src)) tint)
(all (1 ==) tint) .&&.
counterexample (mapCompare tres fres) (tres == fres) .&&.
(n > 2 ==> tint /= fint)
```

Here, `dummyEnumerate`

is some method which performs the same task, but *doesn’t* construct a nexus, so we can ensure that our tests really do catch faulty implementations.

Bird, Richard, and Ralf Hinze. 2003. “Functional Pearl Trouble Shared is Trouble Halved.” In *Proceedings of the 2003 ACM SIGPLAN Workshop on Haskell*, 1–6. Haskell ’03. New York, NY, USA: ACM. doi:10.1145/871895.871896. http://doi.acm.org/10.1145/871895.871896.

Bird, Richard, and Shin-Cheng Mu. 2005. “Countdown: A case study in origami programming.” *Journal of Functional Programming* 15 (05) (August): 679. doi:10.1017/S0956796805005642. http://www.journals.cambridge.org/abstract_S0956796805005642.

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” *BRICS Report Series* 12 (3). doi:10.7146/brics.v12i3.21869. https://tidsskrift.dk/brics/article/view/21869.

Hutton, Graham. 2002. “The Countdown Problem.” *J. Funct. Program.* 12 (6) (November): 609–616. doi:10.1017/S0956796801004300. http://www.cs.nott.ac.uk/~pszgmh/countdown.pdf.

Steffen, Peter, and Robert Giegerich. 2006. “Table Design in Dynamic Programming.” *Information and Computation* 204 (9) (September): 1325–1345. doi:10.1016/j.ic.2006.02.006. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.601&rep=rep1&type=pdf.

Tobin, Jared. 2016. “Time Traveling Recursion Schemes.” *jtobin.io*. https://jtobin.io/time-traveling-recursion.

If you think that structure looks more like a funny linked list than a tree, that’s because it is. Instead of talking about “left” and “right” branches, we could talk about the first and second elements in a list: in fact, this is exactly what’s happening in the famous

`zipWith`

Fibonacci implementation (in reverse).Or, in my favourite version:

↩

Part 1 of a 5-part series on Breadth-First Traversals

Tags: Haskell

In contrast to the more common binary trees, in a rose tree every node can have any number of children.

One of the important manipulations of this data structure, which forms the basis for several other algorithms, is a breadth-first traversal. I’d like to go through a couple of techniques for implementing it, and how more generally you can often get away with using much simpler data structures if you really pinpoint the API you need from them.

As a general technique, Okasaki (2000) advises that a queue be used:

```
breadthFirst :: Tree a -> [a]
breadthFirst tr = go (singleton tr)
where
go q = case pop q of
Nothing -> []
Just (Node x xs,qs) -> x : go (qs `append` xs)
```

There are three functions left undefined there: `singleton`

, `pop`

, and `append`

. They represent the API of our as-of-yet unimplemented queue, and their complexity will dictate the complexity of the overall algorithm. As a (bad) first choice, we could use simple lists, with the functions defined thus:

Those repeated appends are bad news. The queue needs to be able to support popping from one side and appending from the other, which is something lists absolutely *cannot* do well.

We could swap in a more general queue implementation, possibly using Data.Sequence, or a pair of lists. But these are more complex and general than we need, so let’s try and pare down the requirements a little more.

First, we don’t need a pop: the go function can be expressed as a fold instead. Second, we don’t need *every* append to be immediately stuck into the queue, we can batch them, first appending to a structure that’s efficient for appends, and then converting that to a structure which is efficient for folds. In code:

```
breadthFirst :: Forest a -> [a]
breadthFirst ts = foldr f b ts []
where
f (Node x xs) fw bw = x : fw (xs : bw)
b [] = []
b qs = foldl (foldr f) b qs []
```

We’re consing instead of appending, but the consumption is being done in the correct direction anyway, because of the `foldl`

.

So next step: to get the `levels`

function from Data.Tree. Instead of doing a breadth-first traversal, it returns the nodes at each *level* of the tree. Conceptually, every time we did the reverse above (called `foldl`

), we will do a cons as well:

```
levels :: Forest a -> [[a]]
levels ts = foldl f b ts [] []
where
f k (Node x xs) ls qs = k (x : ls) (xs : qs)
b _ [] = []
b k qs = k : foldl (foldl f) b qs [] []
```

The original reason I started work on these problems was this issue in containers. It concerns the `unfoldTreeM_BF`

function. An early go at rewriting it, inspired by levels above, looks like this:

```
unfoldForestM_BF :: Monad m => (b -> m (a, [b])) -> [b] -> m (Forest a)
unfoldForestM_BF f ts = b [ts] (const id)
where
b [] k = pure (k [] [])
b qs k = foldl (foldr t) b qs [] (\x -> k [] . foldr (uncurry run) id x)
t a fw bw k = do
(x,cs) <- f a
let !n = length cs
fw (cs : bw) (k . (:) (x, n))
run x n xs ys =
case splitAt n ys of
(cs,zs) -> Node x cs : xs zs
```

It basically performs the same this as the levels function, but builds the tree back up in the end using the `run`

function. In order to do that, we store the length of each subforest on line 9, so that each node knows how much to take from each level.

A possible optimization is to stop taking the length. Anything in list processing that takes a length screams “wrong” to me (although it’s not always true!) so I often try to find a way to avoid it. The first option would be to keep the `cs`

on line 8 around, and use *it* as an indicator for the length. That keeps it around longer than strictly necessary, though. The other option is to add a third level: for `breadthFirst`

above, we had one level; for `levels`

, we added another, to indicate the structure of the nodes and their subtrees; here, we can add a third, to maintain that structure when building back up:

```
unfoldForestM_BF :: Monad m => (b -> m (a, [b])) -> [b] -> m (Forest a)
unfoldForestM_BF f ts = b [ts] (\ls -> concat . ls)
where
b [] k = pure (k id [])
b qs k = foldl g b qs [] (\ls -> k id . ls)
g a xs qs k = foldr t (\ls ys -> a ys (k . run ls)) xs [] qs
t a fw xs bw = f a >>= \(x,cs) -> fw (x:xs) (cs:bw)
run x xs = uncurry (:) . foldl go ((,) [] . xs) x
where
go ys y (z:zs) = (Node y z : ys', zs')
where
(ys',zs') = ys zs
```

This unfortunately *slows down* the code.

*Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming*, 131–136. ICFP ’00. New York, NY, USA: ACM. doi:10.1145/351240.351253. https://www.cs.tufts.edu/~nr/cs257/archive/chris-okasaki/breadth-first.pdf.

While working on something else, I figured out a nice Haskell implementation of Huffman coding, and I thought I’d share it here. I’ll go through a few techniques for transforming a multi-pass algorithm into a single-pass one first, and then I’ll show how to use them for Huffman. If you just want to skip to the code, it’s provided at the end.^{1}

The algorithm isn’t single-pass in the sense of Adaptive Huffman Coding: it still uses the normal Huffman algorithm, but the input is transformed in the same traversal that builds the tree to transform it.

There are several techniques for turning multi-pass algorithms into single-pass ones in functional languages. Perhaps the most famous is circular programming: using *laziness* to eliminate a pass. Bird (1984) used this to great effect in solving the repmin problem:

Given a tree of integers, replace every integer with the minimum integer in the tree, in one pass.

For an imperative programmer, the problem is relatively easy: first, write the code to find the minimum value in the tree in the standard way, using a loop and a “smallest so far” accumulator. Then, inside the loop, after updating the accumulator, set the value of the leaf to be a *reference* to the accumulator.

At first, that solution may seem necessarily impure: we’re using global, mutable state to update many things at once. However, as the paper shows, we can claw back purity using laziness:

```
data Tree a = Leaf a | Tree a :*: Tree a
repMin :: Tree Integer -> Tree Integer
repMin xs = ys where
(m, ys) = go xs
go (Leaf x) = (x, Leaf m)
go (xs :*: ys) = (min x y, xs' :*: ys')
where
(x,xs') = go xs
(y,ys') = go ys
```

Let’s say we don’t have laziness at our disposal: are we hosed? No!^{2} Danvy and Goldberg (2005) explore this very issue, by posing the question:

Given two lists, xs and ys, can you zip xs with the reverse of ys in one pass?

The technique used to solve the problem is named “There and Back Again”; it should be clear why from one of the solutions:

```
convolve xs ys = walk xs const where
walk [] k = k [] ys
walk (x:xs) k = walk xs (\r (y:ys) -> k ((x,y) : r) ys)
```

The traversal of one list builds up the function to consume the other. We could write repmin in the same way:

```
repMin = uncurry ($) . go where
go (Leaf x) = (Leaf, x)
go (xs :*: ys) = (\m -> xs' m :*: ys' m, min xm ym) where
(xs',xm) = go xs
(ys',ym) = go ys
```

If you’re doing a lot of appending to some list-like structure, you probably don’t want to use actual lists: you’ll end up traversing the left-hand-side of the append many more times than necessary. A type you can drop in to use instead is difference lists (Hughes 1986):

```
type DList a = [a] -> [a]
rep :: [a] -> DList a
rep = (++)
abs :: DList a -> [a]
abs xs = xs []
append :: DList a -> DList a -> DList a
append = (.)
```

`append`

is $\mathcal{O}(1)$ in this representation. In fact, for any monoid with a slow `mappend`

, you can use the same trick: it’s called the Cayley representation, and available as `Endo`

in Data.Monoid.

```
rep :: Monoid a => a -> Endo a
rep x = Endo (mappend x)
abs :: Monoid a => Endo a -> a
abs (Endo f) = f mempty
instance Monoid (Endo a) where
mempty = Endo id
mappend (Endo f) (Endo g) = Enfo (f . g)
```

You can actually do the same transformation for “monoids” in the categorical sense: applying it to monads, for instance, will give you codensity (Rivas and Jaskelioff 2014).

Looking back—just for a second—to the repmin example, we should be able to spot a pattern we can generalize. There’s really nothing tree-specific about it, so why can’t we apply it to lists? Or other structures, for that matter? It turns out we can: the `mapAccumL`

function is tailor-made to this need:

```
repMin :: Traversable t => t Integer -> t Integer
repMin xs = ys where
(~(Just m), ys) = mapAccumL f Nothing xs
f Nothing x = (Just x, m)
f (Just y) x = (Just (min x y), m)
```

The tilde before the `Just`

ensures this won’t fail on empty input.

Finally, it’s time for the main event. Huffman coding is a *very* multi-pass algorithm, usually. The steps look like this:

- Build a frequency table for each character in the input.
- Build a priority queue from that frequency table.
- Iteratively pop elements and combine them (into Huffman trees) from the queue until there’s only one left.
- That Huffman tree can be used to construct the mapping from items back to their Huffman codes.
- Traverse the input again, using the constructed mapping to replace elements with their codes.

We can’t *skip* any of these steps: we can try perform them all at once, though.

Let’s write the multi-pass version first. We’ll need the frequency table:

And a heap, ordered on the frequencies of its elements (I’m using a skew heap here):

```
data Heap a
= Nil
| Node {-# UNPACK #-} !Int a (Heap a) (Heap a)
instance Monoid (Heap a) where
mappend Nil ys = ys
mappend xs Nil = xs
mappend h1@(Node i x lx rx) h2@(Node j y ly ry)
| i <= j = Node i x (mappend h2 rx) lx
| otherwise = Node j y (mappend h1 ry) ly
mempty = Nil
```

Next, we need to build the tree^{3}. We can use the tree type from above.

```
buildTree :: Map a Int -> Maybe (Tree a)
buildTree = prune . toHeap where
toHeap = Map.foldMapWithKey (\k v -> Node v (Leaf k) Nil Nil)
prune Nil = Nothing
prune (Node i x l r) = case mappend l r of
Nil -> Just x
Node j y l' r' ->
prune (mappend (Node (i+j) (x :*: y) Nil Nil) (mappend l' r'))
```

Then, a way to convert between the tree and a map:

```
toMapping :: Ord a => Tree a -> Map a [Bool]
toMapping (Leaf x) = Map.singleton x []
toMapping (xs :*: ys) =
Map.union (fmap (True:) (toMapping xs)) (fmap (False:) (toMapping ys))
```

And finally, putting the whole thing together:

```
huffman :: Ord a => [a] -> (Maybe (Tree a), [[Bool]])
huffman xs = (tree, map (mapb Map.!) xs) where
freq = frequencies xs
tree = buildTree freq
mapb = maybe Map.empty toMapping tree
```

The first thing to fix is the `toMapping`

function: at every level, it calls `union`

, a complex and expensive operation. However, `union`

and `empty`

form a monoid, so we can use the Cayley representation to reduce the calls to a minimum. Next, we want to get rid of the `fmap`

s: we can do that by assembling a function to perform the `fmap`

as we go, as in `convolve`

^{4}.

```
toMapping :: Ord a => Tree a -> Map a [Bool]
toMapping tree = go tree id Map.empty where
go (Leaf x) k = Map.insert x (k [])
go (xs :*: ys) k =
go xs (k . (:) True) . go ys (k . (:) False)
```

Secondly, we can integrate the `toMapping`

function with the `buildTree`

function, removing another pass:

```
buildTree :: Ord a => Map a Int -> Maybe (Tree a, Map a [Bool])
buildTree = prune . toHeap where
toHeap = Map.foldMapWithKey (\k v -> Node v (Leaf k, leaf k) Nil Nil)
prune Nil = Nothing
prune (Node i x l r) = case mappend l r of
Nil -> Just (fmap (\k -> k id Map.empty) x)
Node j y l' r' ->
prune (mappend (Node (i+j) (cmb x y) Nil Nil) (mappend l' r'))
leaf x k = Map.insert x (k [])
node xs ys k = xs (k . (:) True) . ys (k . (:) False)
cmb (xt,xm) (yt,ym) = (xt :*: yt, node xm ym)
```

Finally, to remove the second pass over the list, we can copy repmin, using `mapAccumL`

to both construct the mapping and apply it to the structure in one go.

```
huffman :: (Ord a, Traversable t) => t a -> (Maybe (Tree a), t [Bool])
huffman xs = (fmap fst tree, ys) where
(freq,ys) = mapAccumL f Map.empty xs
f fm x = (Map.insertWith (+) x 1 fm, mapb Map.! x)
tree = buildTree freq
mapb = maybe Map.empty snd tree
```

And that’s it!

The similarity between the repmin function and the solution above is suggestive: is there a way to *encode* this idea of making a multi-pass algorithm single-pass? Of course! We can use an applicative:

```
data Circular a b c =
Circular !a
(b -> c)
instance Functor (Circular a b) where
fmap f (Circular tally run) = Circular tally (f . run)
instance Monoid a =>
Applicative (Circular a b) where
pure x = Circular mempty (const x)
Circular fl fr <*> Circular xl xr =
Circular
(mappend fl xl)
(\r -> fr r (xr r))
liftHuffman
:: Ord a
=> a -> Circular (Map a Int) (Map a [Bool]) [Bool]
liftHuffman x = Circular (Map.singleton x 1) (Map.! x)
runHuffman
:: Ord a
=> Circular (Map a Int) (Map a [Bool]) r -> (Maybe (Tree a), r)
runHuffman (Circular smry run) =
maybe (Nothing, run Map.empty) (Just *** run) (buildTree smry)
huffman
:: (Ord a, Traversable t)
=> t a -> (Maybe (Tree a), t [Bool])
huffman = runHuffman . traverse liftHuffman
```

Thanks to it being an applicative, you can do all the fun lensy things with it:

```
showBin :: [Bool] -> String
showBin = map (bool '0' '1')
>>> let liftBin = fmap showBin . liftHuffman
>>> (snd . runHuffman . (each.traverse) liftBin) ("abb", "cad", "c")
(["01","11","11"],["00","01","10"],["00"])
```

Bringing us back to the start, it can also let us solve repmin!

```
liftRepMin :: a -> Circular (Option (Min a)) a a
liftRepMin x = Circular (pure (pure x)) id
runRepMin :: Circular (Option (Min a)) a b -> b
runRepMin (Circular m r) = r (case m of
Option (Just (Min x)) -> x)
repMin :: (Ord a, Traversable t) => t a -> t a
repMin = runRepMin . traverse liftRepMin
```

So the `Circular`

type is actually just the product of reader and writer, and is closely related to the sort type.

It’s also related to the `Prescient`

type, which I noticed after I’d written the above.

Bird, Richard, Geraint Jones, and Oege De Moor. 1997. “More haste‚ less speed: Lazy versus eager evaluation.” *Journal of Functional Programming* 7 (5) (September): 541–547. doi:10.1017/S0956796897002827. https://ora.ox.ac.uk/objects/uuid:761a4646-60a2-4622-a1e0-ddea11507d57/datastreams/ATTACHMENT01.

Bird, R. S. 1984. “Using Circular Programs to Eliminate Multiple Traversals of Data.” *Acta Inf.* 21 (3) (October): 239–250. doi:10.1007/BF00264249. http://dx.doi.org/10.1007/BF00264249.

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” http://brics.dk/RS/05/3/BRICS-RS-05-3.pdf.

Hughes, R. John Muir. 1986. “A Novel Representation of Lists and Its Application to the Function "Reverse".” *Information Processing Letters* 22 (3) (March): 141–144. doi:10.1016/0020-0190(86)90059-1. http://www.sciencedirect.com/science/article/pii/0020019086900591.

Pippenger, Nicholas. 1997. “Pure Versus Impure Lisp.” *ACM Trans. Program. Lang. Syst.* 19 (2) (March): 223–238. doi:10.1145/244795.244798. http://doi.acm.org/10.1145/244795.244798.

Rivas, Exequiel, and Mauro Jaskelioff. 2014. “Notions of Computation as Monoids.” *arXiv:1406.4823 [cs, math]* (May). http://arxiv.org/abs/1406.4823.

Huffman coding single-pass implementation:

↩`import Data.Map.Strict (Map) import qualified Data.Map.Strict as Map import Data.Traversable (mapAccumL) data Heap a = Nil | Node {-# UNPACK #-} !Int a (Heap a) (Heap a) instance Monoid (Heap a) where mappend Nil ys = ys mappend xs Nil = xs mappend h1@(Node i x lx rx) h2@(Node j y ly ry) | i <= j = Node i x (mappend h2 rx) lx | otherwise = Node j y (mappend h1 ry) ly mempty = Nil data Tree a = Leaf a | Tree a :*: Tree a buildTree :: Ord a => Map a Int -> Maybe (Tree a, Map a [Bool]) buildTree = prune . toHeap where toHeap = Map.foldMapWithKey (\k v -> Node v (Leaf k, leaf k) Nil Nil) prune Nil = Nothing prune (Node i x l r) = case mappend l r of Nil -> Just (fmap (\k -> k id Map.empty) x) Node j y l' r' -> prune (mappend (Node (i+j) (cmb x y) Nil Nil) (mappend l' r')) leaf x k = Map.insert x (k []) node xs ys k = xs (k . (:) True) . ys (k . (:) False) cmb (xt,xm) (yt,ym) = (xt :*: yt, node xm ym) huffman :: (Ord a, Traversable t) => t a -> (Maybe (Tree a), t [Bool]) huffman xs = (fmap fst tree, ys) where (freq,ys) = mapAccumL f Map.empty xs f fm x = (Map.insertWith (+) x 1 fm, mapb Map.! x) tree = buildTree freq mapb = maybe Map.empty snd tree`

Well, that’s a little bit of a lie. In terms of asympostics, Pippenger (1997) stated a problem that could be solved in linear time in impure Lisp, but $\Omega(n \log n)$ in pure Lisp. Bird, Jones, and Moor (1997) then produced an algorithm that could solve the problem in linear time, by using laziness. So, in some cases, laziness will give you asymptotics you can’t get without it (if you want to stay pure).↩

There’s actually a nicer version of the

`buildTree`

function which uses`StateT (Heap a) Maybe`

, but it’s equivalent to this one under the hood, and I though might be a little distracting.↩Something to notice about this function is that it’s going top-down and bottom-up at the same time. Combining the maps (with

`(.)`

) is done bottom-up, but building the codes is top-down. This means the codes are built in reverse order! That’s why the accumulating parameter (`k`

) is a difference list, rather than a normal list. As it happens, if normal lists were used, the function would be slightly more efficient through sharing, but the codes would all be reversed.↩

Tags: Haskell, Applicative

Here’s an old Haskell chestnut:

`filterM (\_ -> [False,True])`

gives the power set of some input list. It’s one of the especially magical demonstrations of monads. From a high-level perspective, it makes sense: for each element in the list, we want it to be present in one output, and not present in another. It’s hard to see how it actually *works*, though. The (old^{1}) source for `filterM`

doesn’t help hugely, either:

```
filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ [] = return []
filterM p (x:xs) = do
flg <- p x
ys <- filterM p xs
return (if flg then x:ys else ys)
```

Again, elegant and beautiful (aside from the three-space indent), but opaque. Despite not really getting how it works, I was encouraged by its simplicity to try my hand at some of the other functions from Data.List.

Let’s start with the subject of my last post. Here’s the implementation:

```
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p xs = build (\c n ->
let f x a q
| q x = (x : ys, zs)
| otherwise = ([], c (x : ys) zs)
where (ys,zs) = a (p x)
in snd (foldr f (const ([], n)) xs (const False)))
```

It translates over pretty readily:

```
groupByM :: Applicative m => (a -> a -> m Bool) -> [a] -> m [[a]]
groupByM p xs =
fmap snd (foldr f (const (pure ([], []))) xs (const (pure (False))))
where
f x a q = liftA2 st (q x) (a (p x)) where
st b (ys,zs)
| b = (x : ys, zs)
| otherwise = ([], (x:ys):zs)
```

Let’s try it with a similar example to `filterM`

:

It gives the partitions of the list!

So these monadic generalisations have been discovered before, several times over. There’s even a package with monadic versions of the functions in Data.List. Exploring this idea with a little more formality is the paper “All Sorts of Permutations” (Christiansen, Danilenko, and Dylus 2016), and accompanying presentation on YouTube. They show that the monadic version of sort produces permutations of the input list, and examine the output from different sorting algorithms. Here’s a couple of their implementations, altered slightly:

```
insertM :: Monad m => (a -> a -> m Bool) -> a -> [a] -> m [a]
insertM _ x [] = pure [x]
insertM p x yys@(y:ys) = do
lte <- p x y
if lte
then pure (x:yys)
else fmap (y:) (insertM p x ys)
insertSortM :: Monad m => (a -> a -> m Bool) -> [a] -> m [a]
insertSortM p = foldrM (insertM p) []
partitionM :: Applicative m => (a -> m Bool) -> [a] -> m ([a],[a])
partitionM p = foldr f (pure ([],[])) where
f x = liftA2 ifStmt (p x) where
ifStmt flg (tr,fl)
| flg = (x:tr,fl)
| otherwise = (tr,x:fl)
quickSortM :: Monad m => (a -> a -> m Bool) -> [a] -> m [a]
quickSortM p [] = pure []
quickSortM p (x:xs) = do
(gt,le) <- partitionM (p x) xs
ls <- quickSortM p le
gs <- quickSortM p gt
pure (ls ++ [x] ++ gs)
>>> insertSortM (\_ _ -> [False,True]) [1,2,3]
[[1,2,3],[1,3,2],[3,1,2],[2,1,3],[2,3,1],[3,2,1]]
>>> quickSortM (\_ _ -> [False,True]) [1,2,3]
[[3,2,1],[2,3,1],[2,1,3],[3,1,2],[1,3,2],[1,2,3]]
```

As it should be easy to see, they’re very concise and elegant, and strongly resemble the pure versions of the algorithms.

So the examples above are very interesting and cool, but they don’t necessarily have a place in real Haskell code. If you wanted to find the permutations, partitions, or power set of a list you’d probably use a more standard implementation. That’s not to say that these monadic functions have no uses, though: especially when coupled with `State`

they yield readable and fast implementations for certain tricky functions. `ordNub`

, for instance:

```
ordNub :: Ord a => [a] -> [a]
ordNub =
flip evalState Set.empty .
filterM
(\x -> do
flg <- gets (Set.notMember x)
when flg (modify (Set.insert x))
pure flg)
```

Alternatively, using a monadic version of `maximumOn`

:

```
maximumOnM :: (Applicative m, Ord b) => (a -> m b) -> [a] -> m (Maybe a)
maximumOnM p = (fmap . fmap) snd . foldl f (pure Nothing)
where
f a e = liftA2 g a (p e)
where
g Nothing q = Just (q, e)
g b@(Just (o, y)) q
| o < q = Just (q, e)
| otherwise = b
```

You can write a one-pass `mostFrequent`

:

```
mostFrequent :: Ord a => [a] -> Maybe a
mostFrequent =
flip evalState Map.empty .
maximumOnM
(\x -> maybe 1 succ <$> state (Map.insertLookupWithKey (const (+)) x 1))
```

One of the nicest things about the paper was the diagrams of decision trees provided for each sorting algorithm. I couldn’t find a library to do that for me, so I had a go at producing my own. First, we’ll need a data type to represent the tree itself:

We’ll say the left branch is “true” and the right “false”. Applicative and monad instances are relatively mechanical^{2}:

```
instance Applicative (DecTree t) where
pure = Pure
Pure f <*> xs = fmap f xs
Choice c ls rs <*> xs = Choice c (ls <*> xs) (rs <*> xs)
instance Monad (DecTree t) where
Pure x >>= f = f x
Choice c ls rs >>= f = Choice c (ls >>= f) (rs >>= f)
```

We can now create a comparator function that constructs one of these trees, and remembers the values it was given:

```
traceCompare :: a -> a -> DecTree (a,a) Bool
traceCompare x y = Choice (x,y) (Pure True) (Pure False)
```

Finally, to draw the tree, I’ll use a function from my binary tree library:

```
printDecTree :: (Show a, Show b) => String -> DecTree (a,a) b -> IO ()
printDecTree rel t = putStr (drawTreeWith id (go t) "") where
go (Pure xs) = Node (show xs) Leaf Leaf
go (Choice (x,y) tr fl) =
Node (show x ++ rel ++ show y) (go tr) (go fl)
```

And we get these really nice diagrams out:

```
>>> (printDecTree "<=" . insertSortM traceCompare) [1,2,3]
┌[1,2,3]
┌1<=2┤
│ │ ┌[2,1,3]
│ └1<=3┤
│ └[2,3,1]
2<=3┤
│ ┌[1,3,2]
└1<=3┤
│ ┌[3,1,2]
└1<=2┤
└[3,2,1]
>>> (printDecTree "<=" . quickSortM traceCompare) [1,2,3]
┌[1,2,3]
┌2<=3┤
│ └[1,3,2]
┌1<=3┤
│ └[3,1,2]
1<=2┤
│ ┌[2,1,3]
└1<=3┤
│ ┌[2,3,1]
└2<=3┤
└[3,2,1]
```

We can also try it out with the other monadic list functions:

```
>>> (printDecTree "=" . groupByM traceCompare) [1,2,3]
┌[[1,2,3]]
┌2=3┤
│ └[[1,2],[3]]
1=2┤
│ ┌[[1],[2,3]]
└2=3┤
└[[1],[2],[3]]
```

You might notice that none of these “monadic” functions actually require a monad constraint: they’re all applicative. There’s a straightforward implementation that relies only on applicative for most of these functions, with a notable exception: sort. Getting *that* to work with just applicative is the subject of a future post.

Christiansen, Jan, Nikita Danilenko, and Sandra Dylus. 2016. “All Sorts of Permutations (Functional Pearl).” In *Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming*, 168–179. ICFP 2016. New York, NY, USA: ACM. doi:10.1145/2951913.2951949. http://informatik.uni-kiel.de/~sad/icfp2016-preprint.pdf.

The definition has since been updated to more modern Haskell: it now uses a fold, and only requires

`Applicative`

.↩Part of the reason the instances are so mechanical is that this type strongly resembles the free monad:

`data Free f a = Pure a | Free (f (Free f a))`

In fact, the example given in the

`MonadFree`

class is the following:`data Pair a = Pair a a`

`type Tree = Free Pair`

The only difference with the above type and the decision tree is that the decision tree carries a tag with it.

So what’s so interesting about this relationship? Well,

`Pair`

is actually a representable functor. Any representable functor`f a`

can be converted to (and from) a function`key -> a`

, where`key`

is the specific key for`f`

. The key for`Pair`

is`Bool`

: the result of the function we passed in to the sorting functions!In general, you can make a “decision tree” for any function of type

`a -> b`

like so:`type DecTree a b r = Rep f ~ b => Free (Compose ((,) a) f) r`

But more on that in a later post.↩

Here’s a useful function from Data.List:

However, as has been pointed out before^{1}, `groupBy`

expects an equivalence relation, and can exhibit surprising behavior when it doesn’t get one. Let’s say, for instance, that we wanted to group numbers that were close together:

What would you expect on the list `[1, 2, 3, 4, 5]`

? All in the same group? Well, what you actually get is:

This is because the implementation of `groupBy`

only compares to the first element in each group:

Brandon Simmons gave a definition of `groupBy`

that is perhaps more useful, but it used explicit recursion, rather than a fold.

A definition with `foldr`

turned out to be trickier than I expected. I found some of the laziness properties especially difficult:

```
>>> head (groupBy (==) (1:2:undefined))
[1]
>>> (head . head) (groupBy (==) (1:undefined))
1
>>> (head . head . tail) (groupBy (==) (1:2:undefined))
2
```

Here’s the definition I came up with, after some deliberation:

```
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p xs = build (\c n ->
let f x a q
| q x = (x : ys, zs)
| otherwise = ([], c (x : ys) zs)
where (ys,zs) = a (p x)
in snd (foldr f (const ([], n)) xs (const False)))
{-# INLINE groupBy #-}
```

Seemingly benign changes to the function will break one or more of the above tests. In particular, the laziness of a “where” binding needs to be taken into account. Here’s an early attempt which failed:

```
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p xs = build (\c n ->
let f x a q d
| q x = a (p x) (d . (:) x)
| otherwise = d [] (a (p x) (c . (:) x))
in foldr f (\_ d -> d [] n) xs (const False) (\ _ y -> y))
```

Once done, though, it works as expected:

```
>>> groupBy (==) "aaabcccdda"
["aaa","b","ccc","dd","a"]
>>> groupBy (==) []
[]
>>> groupBy (<=) [1,2,2,3,1,2,0,4,5,2]
[[1,2,2,3],[1,2],[0,4,5],[2]]
```

It’s the fastest version I could find that obeyed the above laziness properties.

The GHC page on the issue unfortunately seems to indicate the implementation won’t be changed. Ah, well. Regardless, I have a repository with the implementation above (with extra fusion machinery added) and comparisons to other implementations.

There are several threads on the libraries mailing list on this topic:

- 2006
- mapAccumL - find max in-sequence subsequence
- 2007
- Data.List.groupBy with non-transitive equality predicate (this is the longest discussion on the topic)
- 2008
- Generalize groupBy in a useful way?
- 2009
- nubBy seems broken in recent GHCs

Part 1 of a 3-part series on Balanced Folds

There are three main ways to fold things in Haskell: from the right, from the left, and from either side. Let’s look at the left vs right variants first. `foldr`

works from the right:

And `foldl`

from the left:

As you’ll notice, the result of the two operations above is the same (6; although one may take much longer than the other). In fact, *whenever* the result of `foldr`

and `foldl`

is the same for a pair of arguments (in this case `+`

and `0`

), we say that that pair forms a `Monoid`

for some type (well, there’s some extra stuff to do with `0`

, but I only care about associativity at the moment). In this case, the `Sum`

monoid is formed:

```
newtype Sum a = Sum { getSum :: a }
instance Num a => Monoid (Sum a) where
mempty = Sum 0
mappend (Sum x) (Sum y) = Sum (x + y)
```

When you know that you have a monoid, you can use the `foldMap`

function: this is the third kind of fold. It says that you don’t care which of `foldl`

or `foldr`

is used, so the implementer of `foldMap`

can put the parentheses wherever they want:

And we can’t tell the difference from the result. This is a pretty bare-bones introduction to folds and monoids: you won’t need to know more than that for the rest of this post, but the topic area is fascinating and deep, so don’t let me give you the impression that I’ve done anything more than scratched the surface.

Quite often, we *do* care about where the parentheses go. Take, for instance, a binary tree type, with values at the leaves:

```
data Tree a
= Empty
| Leaf a
| Tree a :*: Tree a
instance Show a =>
Show (Tree a) where
show Empty = "()"
show (Leaf x) = show x
show (l :*: r) = "(" ++ show l ++ "*" ++ show r ++ ")"
```

We can’t (well, shouldn’t) us `foldMap`

here, because we would be able to tell the difference between different arrangements of parentheses:

```
>>> foldMap something [1,2,3]
((1*2)*(3*())) │ (()*((1*2)*3)) │ (((()*1)*2)*3)
───────────────┼────────────────┼───────────────
┌1 │ ┌() │ ┌()
┌┤ │ ┤ │ ┌┤
│└2 │ │ ┌1 │ │└1
┤ │ │┌┤ │ ┌┤
│┌3 │ ││└2 │ │└2
└┤ │ └┤ │ ┤
└() │ └3 │ └3
```

So we use one of the folds which lets us choose the arrangements of parentheses:

```
>>> (foldr (:*:) Empty . map Leaf) [1,2,3,4,5,6]
(1*(2*(3*(4*(5*(6*()))))))
┌1
┌┤
│└2
┌┤
│└3
┌┤
│└4
┌┤
│└5
┌┤
│└6
┤
└()
>>> (foldl (:*:) Empty . map Leaf) [1,2,3,4,5,6]
((((((()*1)*2)*3)*4)*5)*6)
┌()
┤
│┌1
└┤
│┌2
└┤
│┌3
└┤
│┌4
└┤
│┌5
└┤
└6
```

The issue is that neither of the trees generated are necessarily what we want: often, we want something more *balanced*.

To try and find a more balanced fold, let’s (for now) assume we’re always going to get non-empty input. This will let us simplify the `Tree`

type a little, to:

```
data Tree a
= Leaf a
| Tree a :*: Tree a
deriving Foldable
instance Show a =>
Show (Tree a) where
show (Leaf x) = show x
show (l :*: r) = "(" ++ show l ++ "*" ++ show r ++ ")"
```

Then, we can use Jon Fairbairn’s fold described in this email, adapted a bit for our non-empty input:

```
import Data.List.NonEmpty (NonEmpty(..))
treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f = go
where
go (x :| []) = x
go (a :| b:l) = go (f a b :| pairMap l)
pairMap (x:y:rest) = f x y : pairMap rest
pairMap xs = xs
```

There are two parts to this function: `pairMap`

and the `go`

helper. `pairMap`

combines adjacent elements in the list using the combining function. As a top-level function it might look like this:

```
pairMap f (x:y:rest) = f x y : pairMap f rest
pairMap f xs = xs
pairMap (++) ["a","b","c","d","e"]
-- ["ab","cd","e"]
```

As you can see, it leaves any leftovers untouched at the end of the list.

The `go`

helper applies `pairMap`

repeatedly to the list until it has only one element. This gives us much more balanced results that `foldl`

or `foldr`

(turn on `-XOverloadedLists`

to write non-empty lists using this syntax):

```
>>> (treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6]
(((1*2)*(3*4))*(5*6))
┌1
┌┤
│└2
┌┤
││┌3
│└┤
│ └4
┤
│┌5
└┤
└6
>>> (treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6,7,8]
(((1*2)*(3*4))*((5*6)*(7*8)))
┌1
┌┤
│└2
┌┤
││┌3
│└┤
│ └4
┤
│ ┌5
│┌┤
││└6
└┤
│┌7
└┤
└8
```

However, there are still cases where one branch will be much larger than its sibling. The fold fills a balanced binary tree from the left, but any leftover elements are put at the top level. In other words:

```
>>> (treeFold (:*:) . fmap Leaf) [1..9]
((((1*2)*(3*4))*((5*6)*(7*8)))*9)
┌1
┌┤
│└2
┌┤
││┌3
│└┤
│ └4
┌┤
││ ┌5
││┌┤
│││└6
│└┤
│ │┌7
│ └┤
│ └8
┤
└9
```

That `9`

hanging out on its own there is a problem.

One observation we can make is that `pairMap`

always starts from the same side on each iteration, like a typewriter moving from one line to the next. This has the consequence of building up the leftovers on one side, leaving them until the top level.

We can improve the situation slightly by going back and forth, slalom-style, so we consume leftovers on each iteration:

```
treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f = goTo where
goTo (y :| []) = y
goTo (a :| b : rest) = goFro (pairMap f (f a b) rest)
goFro (y :| []) = y
goFro (a :| b : rest) = goTo (pairMap (flip f) (f b a) rest)
pairMap f = go [] where
go ys y (a:b:rest) = go (y:ys) (f a b) rest
go ys y [z] = z :| y : ys
go ys y [] = y :| ys
```

Notice that we have to flip the combining function to make sure the ordering is the same on output. For the earlier example, this solves the issue:

```
>>> (treeFold (:*:) . fmap Leaf) [1..9]
(((1*2)*((3*4)*(5*6)))*((7*8)*9))
┌1
┌┤
│└2
┌┤
││ ┌3
││┌┤
│││└4
│└┤
│ │┌5
│ └┤
│ └6
┤
│ ┌7
│┌┤
││└8
└┤
└9
```

It does *not* build up the tree as balanced as it possibly could, though:

```
>>> (treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6]
((1*2)*((3*4)*(5*6)))
┌1
┌┤
│└2
┤
│ ┌3
│┌┤
││└4
└┤
│┌5
└┤
└6
```

There’s four elements in the right branch, and two in the left in the above example. Three in each would be optimal.

Wait—optimal in what sense, exactly? What do we mean when we say one tree is more balanced than another? Let’s say the “balance factor” is the largest difference in size of two sibling trees:

```
balFac :: Tree a -> Integer
balFac = fst . go where
go :: Tree a -> (Integer, Integer)
go (Leaf _) = (0, 1)
go (l :*: r) = (lb `max` rb `max` abs (rs - ls), rs + ls) where
(lb,ls) = go l
(rb,rs) = go r
```

And one tree is more balanced than another if it has a smaller balance factor.

There’s effectively no limit on the balance factor for the typewriter method: when the input is one larger than a power of two, it’ll stick the one extra in one branch and the rest in another (as with `[1..9]`

in the example above).

For the slalom method, it looks like there’s something more interesting going on, limit-wise. I haven’t been able to verify this formally (yet), but from what I can tell, a tree of height $n$ will have at most a balance factor of the $n$th Jacobsthal number. That’s (apparently) also the number of ways to tie a tie using $n + 2$ turns.

That was just gathered from some quick experiments and oeis.org, but it seems to make sense intuitively. Jacobsthal numbers are defined like this:

So, at the top level, there’s the imbalance caused by the second-last `pairFold`

, plus the imbalance caused by the third-to-last. However, the third-to-last imbalance is twice what it was at that level, because it is now working with an already-paired-up list. Why isn’t the second last imbalance also doubled? Because it’s counteracted by the fact that we turned around: the imbalance is in an element that’s a leftover element. At least that’s what my intuition is at this point.

The minimum balance factor is, of course, one. Unfortunately, to achieve that, I lost some of the properties of the previous folds:

Up until now, I have been avoiding taking the length of the incoming list. It would lose a lot of laziness, cause an extra traversal, and generally seems like an ugly solution. Nonetheless, it gives the most balanced results I could find so far:

```
treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f (x:|xs) = go (length (x:xs)) (x:xs) where
go 1 [y] = y
go n ys = f (go m a) (go (n-m) b) where
(a,b) = splitAt m ys
m = n `div` 2
```

`splitAt`

is an inefficient operation, but if we let the left-hand call return its unused input from the list, we can avoid it:

```
treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f (x:|xs) = fst (go (length (x:xs)) (x:xs)) where
go 1 (y:ys) = (y,ys)
go n ys = (f l r, rs) where
(l,ls) = go m ys
(r,rs) = go (n-m) ls
m = n `div` 2
```

Finally, you may have spotted the state monad in this last version. We can make the similarity explicit:

```
treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f (x:|xs) = evalState (go (length (x:xs))) (x:xs) where
go 1 = state (\(y:ys) -> (y,ys))
go n = do
let m = n `div` 2
l <- go m
r <- go (n-m)
return (f l r)
```

And there you have it: three different ways to fold in a more balanced way. Perhaps surprisingly, the first is the fastest in my tests. I’d love to hear if there’s a more balanced version (which is lazy, ideally) that is just as efficient as the first implementation.

I have found two other uses for these folds other than simply constructing more balanced binary trees. The first is summation of floating-point numbers. If you sum floating-point numbers in the usual way with `foldl'`

(or, indeed, with an accumulator in an imperative language), you will see an error growth of $\mathcal{O}(n)$, where $n$ is the number of floats you’re summing.

A well-known solution to this problem is the Kahan summation algorithm. It carries with it a running compensation for accumulating errors, giving it $\mathcal{O}(1)$ error growth. There are two downsides to the algorithm: it takes four times the number of numerical operations to perform, and isn’t parallel.

For that reason, it’s often not used in practice: instead, floats are summed *pairwise*, in a manner often referred to as cascade summation. This is what’s used in NumPy. The error growth isn’t quite as good—$\mathcal{O}(\log{n})$—but it takes the exact same number of operations as normal summation. On top of that:

Dividing a fold into roughly-equal chunks is exactly the kind of problem encountered when trying to parallelize certain algorithms. Adapting the folds above so that their work is performed in parallel is surprisingly easy:

```
splitPar :: (a -> a -> a) -> (Int -> a) -> (Int -> a) -> Int -> a
splitPar f = go
where
go l r 0 = f (l 0) (r 0)
go l r n = lt `par` (rt `pseq` f lt rt)
where
lt = l (n-m)
rt = r m
m = n `div` 2
treeFoldParallel :: (a -> a -> a) -> NonEmpty a -> a
treeFoldParallel f xs =
treeFold const (splitPar f) xs numCapabilities
```

The above will split the fold into `numCapabilities`

chunks, and perform each one in parallel. `numCapabilities`

is a constant defined in GHC.Conc: it’s the number of threads which can be run simultaneously at any one time. Alternatively, you could the function include a parameter for how many chunks to split the computation into. You could also have the fold adapt as it went, choosing whether or not to spark based on how many sparks exist at any given time:

```
parseq :: a -> b -> b
parseq a b =
runST
(bool (par a b) (seq a b) <$>
unsafeIOToST (liftA2 (>) numSparks getNumCapabilities))
treeFoldAdaptive :: (a -> a -> a) -> a -> [a] -> a
treeFoldAdaptive f =
Lazy.treeFold
(\l r ->
r `parseq` (l `parseq` f l r))
```

Adapted from this comment by Edward Kmett. This is actually the fastest version of all the folds.

All of this is provided in a library I’ve put up on Hackage.

]]>I have been working a little more on my semirings library recently, and I have come across some interesting functions in the process. First, a quick recap on the `Semiring`

class and some related functions:

```
class Semiring a where
one :: a
zero :: a
infixl 6 <+>
(<+>) :: a -> a -> a
infixl 7 <.>
(<.>) :: a -> a -> a
add :: (Foldable f, Semiring a) => f a -> a
add = foldl' (<+>) zero
mul :: (Foldable f, Semiring a) => f a -> a
mul = foldl' (<.>) one
instance Semiring Integer where
one = 1
zero = 0
(<+>) = (+)
(<.>) = (*)
instance Semiring Bool where
one = True
zero = False
(<+>) = (||)
(<.>) = (&&)
```

You can think of it as a replacement for `Num`

, but it turns out to be much more generally useful than that.

The first interesting function is to do with matrix multiplication. Here’s the code for multiplying two matrices represented as nested lists:

```
mulMatrix :: Semiring a => [[a]] -> [[a]] -> [[a]]
mulMatrix xs ys = map (\row -> map (add . zipWith (<.>) row) cs) xs
where
cs = transpose ys
```

One of the issues with this code (other than its woeful performance) is that it seems needlessly list-specific. `zipWith`

seems like the kind of thing that exists on a bunch of different structures. Indeed, the `ZipList`

wrapper uses `zipWith`

as its `<*>`

implementation. Let’s try that for now:

```
mulMatrix :: (Semiring a, Applicative f) => f (f a) -> f (f a) -> f (f a)
mulMatrix xs ys = fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
where
cs = transpose ys
```

Of course, now `add`

needs to work on our `f`

, so it should be `Foldable`

```
mulMatrix
:: (Semiring a, Applicative f, Foldable f)
=> f (f a) -> f (f a) -> f (f a)
mulMatrix = ...
```

`transpose`

is the missing piece now. A little bit of `Applicative`

magic can help us out again, though: `sequenceA`

is `transpose`

on `ZipList`

s (McBride and Paterson 2008).

```
mulMatrix
:: (Semiring a, Applicative f, Traversable f)
=> f (f a) -> f (f a) -> f (f a)
mulMatrix xs ys =
fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
where
cs = sequenceA ys
```

One further generalization: The two `f`

s don’t actually need to be the same:

```
mulMatrix
:: (Applicative n
,Traversable m
,Applicative m
,Applicative p
,Semiring a)
=> n (m a) -> m (p a) -> n (p a)
mulMatrix xs ys = fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
where
cs = sequenceA ys
```

Happily, the way that the wrappers (`n`

, `m`

, and `p`

) match up coincides precisely with how matrix dimensions match up in matrix multiplication. Quoting from the Wikipedia definition:

if $A$ is an $n \times m$ matrix and $B$ is an $m \times p$ matrix, their matrix product $AB$ is an $n \times p$ matrix

This function is present in the linear package with some different constraints. In fairness, `Applicative`

probably isn’t the best thing to use here since it doesn’t work for so many instances (`MonadZip`

or something similar may be more suitable), but it’s very handy to have, and works out-of the box for types like:

```
data Three a
= Three a a a
deriving (Functor, Foldable, Traversable, Eq, Ord, Show)
instance Applicative Three where
pure x = Three x x x
Three fx fy fz <*> Three xx xy xz = Three (fx xx) (fy xy) (fz xz)
```

Which makes it (to my mind) useful enough to keep. Also, it hugely simplified the code for matrix multiplication in square matrices I had, from Okasaki (1999).

If you’re putting a general class in a library that you want people to use, and there exist sensible instances for common Haskell types, you should probably provide those instances in the library to avoid orphans. The meaning of “sensible” here is vague: generally speaking, if there is only one obvious or clear instance, then it’s sensible. For a list instance for the semiring class, for instance, I could figure out several law-abiding definitions for `<+>`

, `one`

and `zero`

, but only one for `<.>`

: polynomial multiplication. You know, where you multiply two polynomials like so:

$(x^3 + 2x + 3)(5x + 3x^2 + 4) = 9x^5 + 15x^4 + 18x^3 + 28x^2 + 38x + 24$

A more general definition looks something like this:

$(a_0x^0 + a_1x^1 + a_2x^2)(b_0x^0 + b_1x^1 + b_2x^2) =$ $a_0b_0x^0 + (a_0b_1 + a_1b_0)x^1 + (a_0b_2 + a_1b_1 + a_2b_0)x^2 + (a_1b_2 + a_2b_1)x^3 + a_2b_2x^4$

Or, fully generalized:

$c_k = a_0b_k + a_1b_{k-1} + \ldots + a_{k-1}b_1 + a_kb_0$ $f(x) \times g(x) = \sum_{i=0}^{n+m}c_ix^i$

So it turns out that you can represent polynomials pretty elegantly as lists. Take an example from above:

$x^3 + 2x + 3$

And rearrange it in order of the powers of $x$:

$3x^0 + 2x^1 + x^3$

And fill in missing coefficients:

$3x^0 + 2x^1 + 0x^2 + 1x^3$

And then the list representation of that polynomial is the list of those coefficients:

For me, the definitions of multiplication above were pretty hard to understand. In Haskell, however, the definition is quite beautiful:

```
instance Semiring a => Semiring [a] where
one = [one]
zero = []
[] <+> ys = ys
xs <+> [] = xs
(x:xs) <+> (y:ys) = x <+> y : (xs <+> ys)
_ <.> [] = []
[] <.> _ = []
(x:xs) <.> (y:ys) = (x<.>y) : map (x<.>) ys <+> xs <.> (y:ys)
```

This definition for `<.>`

can be found on page 4 of McIlroy (1999). Although there was a version of the paper with a slightly different definition:

```
_ <.> [] = []
[] <.> _ = []
(x:xs) <.> (y:ys)
= (x<.>y) : (map (x<.>) ys <+> map (<.>y) xs <+> (zero : (xs <.> ys)))
```

Similar to one which appeared in Dolan (2013).

As it happens, I prefer the first definition. It’s shorter, and I figured out how to write it as a fold:

And if you inline the `<+>`

, you get a reasonable speedup:

```
xs <.> ys = foldr f [] xs
where
f x zs = foldr (g x) id ys (zero : zs)
g x y a (z:zs) = x <.> y <+> z : a zs
g x y a [] = x <.> y : a []
```

The definition of `<+>`

can also use a fold on either side for fusion purposes:

```
(<+>) = foldr f id where
f x xs (y:ys) = x <+> y : xs ys
f x xs [] = x : xs []
(<+>) = flip (foldr f id) where
f y ys (x:xs) = x <+> y : ys xs
f y ys [] = y : ys []
```

There are rules in the library to choose one of the above definitions if fusion is available.

This definition is much more widely useful than it may seem at first. Say, for instance, you wanted to search through pairs of things from two infinite lists. You can’t use the normal way to pair things for lists, the Cartesian product, because it will diverge:

```
[(x,y) | x <- [1..], y <- [1..]]
-- [(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10)...
```

You’ll never get beyond 1 in the first list. Zipping isn’t an option either, because you won’t really explore the search space, only corresponding pairs. Brent Yorgey showed that if you want a list like this:

```
[(y,x-y) | x <- [0..], y <- [0..x] ]
-- [(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)...
```

Then what you’re looking for is a convolution (the same thing as polynomial multiplication). `<.>`

above can be adapted readily:

```
convolve :: [a] -> [b] -> [[(a,b)]]
convolve xs ys = foldr f [] xs
where
f x zs = foldr (g x) id ys ([] : zs)
g x y a (z:zs) = ((x, y) : z) : a zs
g x y a [] = [(x, y)] : a []
```

Flatten out this result to get your ordering. This convolution is a little different from the one in the blog post. By inlining `<+>`

we can avoid the expensive `++`

function, without using difference lists.

Here’s another cool use of lists as polynomials: they can be used as a positional numeral system. Most common numeral systems are positional, including Arabic (the system you most likely use, where twenty-four is written as 24) and binary. Non-positional systems are things like Roman numerals. Looking at the Arabic system for now, we see that the way of writing down numbers:

$1989$

Can be thought of the sum of each digit multiplied by ten to the power of its position:

$1989 = 1 \times 10^3 \plus 9 \times 10^2 \plus 8 \times 10^1 \plus 9 \times 10^0$ $1989 = 1 \times 1000 \plus 9 \times 100 \plus 8 \times 10 \plus 9 \times 1$ $1989 = 1000 \plus 900 \plus 80 \plus 9$ $1989 = 1989$

Where the positions are numbered from the right. In other words, it’s our polynomial list from above in reverse. As well as that, the convolution is long multiplication.

Now, taking this straight off we can try some examples:

The issue, of course, is that we’re not handling carrying properly:

No matter: we can perform all the carries after the addition, and everything works out fine:

```
carry
:: Integral a
=> a -> [a] -> [a]
carry base xs = foldr f (toBase base) xs 0
where
f e a cin = r : a q where
(q,r) = quotRem (cin + e) base
toBase :: Integral a => a -> a -> [a]
toBase base = unfoldr f where
f 0 = Nothing
f n = Just (swap (quotRem n base))
```

Wrap the whole thing in a newtype and we can have a `Num`

instance:

```
newtype Positional
= Positional
{ withBase :: Integer -> [Integer]
}
instance Num Positional where
Positional x + Positional y = Positional (carry <*> x <+> y)
Positional x * Positional y = Positional (carry <*> x <.> y)
fromInteger m = Positional (\base -> toBase base m)
abs = id
signum = id
negate = id
toDigits :: Integer -> Positional -> [Integer]
toDigits base p = reverse (withBase p base)
```

This also lets us choose our base after the fact:

```
sumHundred = (sum . map fromInteger) [1..100]
toDigits 10 sumHundred
-- [5,0,5,0]
toDigits 2 sumHundred
-- [1,0,0,1,1,1,0,1,1,1,0,1,0]
```

All the hand-optimizing, inlining, and fusion magic in the world won’t make a list-based implementation of convolution faster than a proper one on vectors, unfortunately. In particular, for larger vectors, a fast Fourier transform can be used. Also, usually code like this will be parallelized, rather than sequential. That said, it can be helpful to implement the slower version on vectors, in the usual indexed way, for comparison’s sake:

```
instance Semiring a =>
Semiring (Vector a) where
one = Vector.singleton one
zero = Vector.empty
xs <+> ys =
case compare (Vector.length xs) (Vector.length ys) of
EQ -> Vector.zipWith (<+>) xs ys
LT -> Vector.unsafeAccumulate (<+>) ys (Vector.indexed xs)
GT -> Vector.unsafeAccumulate (<+>) xs (Vector.indexed ys)
signal <.> kernel
| Vector.null signal = Vector.empty
| Vector.null kernel = Vector.empty
| otherwise = Vector.generate (slen + klen - 1) f
where
f n =
foldl'
(\a k ->
a <+>
Vector.unsafeIndex signal k <.>
Vector.unsafeIndex kernel (n - k))
zero
[kmin .. kmax]
where
!kmin = max 0 (n - (klen - 1))
!kmax = min n (slen - 1)
!slen = Vector.length signal
!klen = Vector.length kernel
```

As has been observed before (Rivas, Jaskelioff, and Schrijvers 2015) there’s a pretty suggestive similarity between semirings and the `Applicative`

/`Alternative`

classes in Haskell:

```
class Semiring a where
one :: a
zero :: a
(<+>) :: a -> a -> a
(<.>) :: a -> a -> a
class Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
class Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
```

So can our implementation of convolution be used to implement the methods for these classes? Partially:

```
newtype Search f a = Search { runSearch :: [f a] }
instance Functor f => Functor (Search f) where
fmap f (Search xs) = Search ((fmap.fmap) f xs)
instance Alternative f => Applicative (Search f) where
pure x = Search [pure x]
_ <*> Search [] = Search []
Search xs <*> Search ys = Search (foldr f [] xs) where
f x zs = foldr (g x) id ys (empty : zs)
g x y a (z:zs) = (x <*> y <|> z) : a zs
g x y a [] = (x <*> y) : a []
instance Alternative f => Alternative (Search f) where
Search xs <|> Search ys = Search (go xs ys) where
go [] ys = ys
go xs [] = xs
go (x:xs) (y:ys) = (x <|> y) : go xs ys
empty = Search []
```

At first, this seems perfect: the types all match up, and the definitions seem sensible. The issue is with the laws: `Applicative`

and `Alternative`

are missing *four* that semirings require. In particular: commutativity of plus, annihilation by zero, and distributivity left and right:

```
xs <|> ys = ys <|> xs
empty <*> xs = fs <*> empty = empty
fs <*> (xs <|> ys) = fs <*> xs <|> fs <*> ys
(fs <|> gs) <*> xs = fs <*> xs <|> gs <*> ys
```

The vast majority of the instances of `Alternative`

today fail one or more of these laws. Taking lists as an example, `++`

obviously isn’t commutative, and `<*>`

only distributes when it’s on the right.

What’s the problem, though? Polynomial multiplication follows *more* laws than those required by `Applicative`

: why should that worry us? Unfortunately, in order for multiplication to follow those laws, it actually relies on the underlying semiring being law-abiding. And it *fails* the applicative laws when it isn’t.

There are two angles from which we could come at this problem: either we relax the semiring laws and try and make our implementation of convolution rely on them as little as possible, or we find `Alternative`

instances which follow the semiring laws. Or we could meet in the middle, relaxing the laws as much as possible until we find some `Alternative`

s that meet our standards.

This has actually been accomplished in several papers: the previously mentioned Rivas, Jaskelioff, and Schrijvers (2015) discusses near-semirings, defined as semiring-like structures with associativity, identity, and these two laws:

$0 \times x = 0$ $(x \plus y) \times z = (x \times z) \plus (y \times z)$

In contrast to normal semirings, zero only annihilates when it’s on the left, and multiplication only distributes over addition when it’s on the right. Addition is not required to be commutative.

The lovely paper Spivey (2009) has a similar concept: a “bunch”.

```
class Bunch m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
zero :: m a
(<|>) :: m a -> m a -> m a
wrap :: m a -> m a
```

The laws are all the same (with `<*>`

implemented in terms of `>>=`

), and the extra `wrap`

operation can be expressed like so:

A definition of `>>=`

for our polynomials is also provided:

This will require the underlying `f`

to be `Foldable`

. We can inline a little, and express the whole thing as a fold:

```
instance (Foldable f, Alternative f) => Monad (Search f) where
Search xs >>= k = foldr f empty xs where
f e a = foldr ((<|>) . k) (wrap a) e
```

For `Search`

to meet the requirements of a bunch, the paper notes that the `f`

must be assumed to be a bag, i.e., the order of its elements must be ignored.

Kiselyov et al. (2005) kind of goes the other direction, defining a monad which has fair disjunction and conjunction. Unfortunately, the fair conjunction loses associativity.

The end of the paper on algebras for combinatorial search wonders if notions of distance could be added to some of the algebras. I *think* that should be as simple as supplying a suitable near-semiring for `f`

, but the definition of `>>=`

would need to be changed. The near-semiring I had in mind was the probability monad. It works correctly if inlined:

```
newtype Search s a = Search { runSearch :: [[(a,s)]] }
instance Functor (Search s) where
fmap f (Search xs) = Search ((fmap.fmap.first) f xs)
instance Semiring s => Applicative (Search s) where
pure x = Search [[(x,one)]]
_ <*> Search [] = Search []
Search xs <*> Search ys = Search (foldr f [] xs) where
f x zs = foldr (g x) id ys (empty : zs)
g x y a (z:zs) = (m x y ++ z) : a zs
g x y a [] = (m x y) : a []
m ls rs = [(l r, lp<.>rp) | (l,lp) <- ls, (r,rp) <- rs]
instance Semiring s => Alternative (Search s) where
Search xs <|> Search ys = Search (go xs ys) where
go [] ys = ys
go xs [] = xs
go (x:xs) (y:ys) = (x ++ y) : go xs ys
empty = Search []
wrap :: Search s a -> Search s a
wrap (Search xs) = Search ([] : xs)
instance Semiring s => Monad (Search s) where
Search xs >>= k = foldr f empty xs where
f e a = foldr ((<|>) . uncurry (mulIn . k)) (wrap a) e
mulIn (Search x) xp = Search ((fmap.fmap.fmap) (xp<.>) x)
```

But I couldn’t figure out how to get it to work for a more generalized inner monad. The above could probably be sped up, or randomized, using the many well-known techniques for probability monad optimization.

Dolan, Stephen. 2013. “Fun with semirings: A functional pearl on the abuse of linear algebra.” In, 48:101. ACM Press. doi:10.1145/2500365.2500613. https://www.cl.cam.ac.uk/~sd601/papers/semirings.pdf.

Kiselyov, Oleg, Chung-chieh Shan, Daniel P Friedman, and Amr Sabry. 2005. “Backtracking, interleaving, and terminating monad transformers (functional pearl).” *ACM SIGPLAN Notices* 40 (9): 192–203. http://okmij.org/ftp/Computation/monads.html#LogicT.

McBride, Conor, and Ross Paterson. 2008. “Applicative programming with effects.” *Journal of functional programming* 18 (01): 1–13. http://strictlypositive.org/Idiom.pdf.

McIlroy, M. Douglas. 1999. “Power Series, Power Serious.” *J. Funct. Program.* 9 (3) (May): 325–337. doi:10.1017/S0956796899003299. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.333.3156&rep=rep1&type=pdf.

Okasaki, Chris. 1999. “From Fast Exponentiation to Square Matrices: An Adventure in Types.” In *Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’99), Paris, France, September 27-29, 1999*, 34:28. ACM. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.357&rep=rep1&type=pdf.

Rivas, Exequiel, Mauro Jaskelioff, and Tom Schrijvers. 2015. “From monoids to near-semirings: The essence of MonadPlus and Alternative.” In *Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming*, 196–207. ACM. doi:10.1145/2790449.2790514. http://www.fceia.unr.edu.ar/~mauro/pubs/FromMonoidstoNearsemirings.pdf.

Spivey, J. Michael. 2009. “Algebras for combinatorial search.” *Journal of Functional Programming* 19 (3-4) (July): 469–487. doi:10.1017/S0956796809007321. https://pdfs.semanticscholar.org/db3e/373bb6e7e7837ebc524da0a25903958554ed.pdf.

Tags: Haskell, Applicative

There are a couple partial functions in the Haskell Prelude which people seem to agree shouldn’t be there. `head`

, for example, will throw an error on an empty list. Most seem to agree that it should work something more like this:

There are other examples, like `last`

, `!!`

, etc.

One which people *don’t* agree on, however, is division by zero. In the current Prelude, the following will throw an error:

The “safe” version might have a signature like this:

However, this turns out to be quite a headache for writing code generally. So the default is the (somewhat) unsafe version.

Is there a way to introduce a safer version without much overhead, so the programmer is given the option? Of course! With some newtype magic, it’s pretty simple to write a wrapper which catches division by zero in some arbitrary monad:

```
newtype AppNum f a = AppNum
{ runAppNum :: f a
} deriving (Functor,Applicative,Monad,Alternative,Show,Eq,MonadFail)
instance (Num a, Applicative f) =>
Num (AppNum f a) where
abs = fmap abs
signum = fmap signum
(+) = liftA2 (+)
(*) = liftA2 (*)
(-) = liftA2 (-)
negate = fmap negate
fromInteger = pure . fromInteger
instance (Fractional a, MonadFail f, Eq a) =>
Fractional (AppNum f a) where
fromRational = pure . fromRational
xs / ys =
ys >>=
\case
0 -> fail "divide by zero"
y -> fmap (/ y) xs
```

I’m using the `-XLambdaCase`

extension and `MonadFail`

here.

You’ll notice that you only need `Applicative`

for most of the arithmetic operations above. In fact, you only need `Monad`

when you want to examine the contents of `f`

. Using that fact, we can manipulate expression trees using the free applicative from the free package. Say, for instance, we want to have free variables in our expressions. Using `Either`

, it’s pretty easy:

```
type WithVars = AppNum (Ap (Either String)) Integer
var :: String -> WithVars
var = AppNum . liftAp . Left
```

We can collect the free variables from an expression:

```
vars :: WithVars -> [String]
vars = runAp_ (either pure (const [])) . runAppNum
x = 1 :: WithVars
y = var "y"
z = var "z"
vars (x + y + z) -- ["y","z"]
```

If we want to sub in, though, we’re going to run into a problem: we can’t just pass in a `Map String Integer`

because you’re able to construct values like this:

```
bad :: AppNum (Ap (Either String)) (Integer -> Integer -> Integer)
bad = AppNum (liftAp (Left "oh noes"))
```

We’d need to pass in a `Map String (Integer -> Integer -> Integer)`

as well; in fact you’d need a map for every possible type. Which isn’t feasible.

Luckily, we *can* constrain the types of variables in our expression so that they’re always `Integer`

, using a GADT:

The type above seems useless on its own: it doesn’t have a `Functor`

instance, never mind an `Applicative`

, so how can it fit into `AppNum`

?

The magic comes from the free applicative, which converts any type of kind `Type -> Type`

into an applicative. With that in mind, we can change around the previous code:

```
type WithVars = AppNum (Ap Variable) Integer
var :: String -> WithVars
var = AppNum . liftAp . Variable
vars :: WithVars -> [String]
vars = runAp_ f . runAppNum
where
f :: Variable a -> [String]
f (Constant _) = []
f (Variable s) = [s]
```

And write the function to sub in for us:

```
variableA
:: Applicative f
=> (String -> f Integer) -> Variable a -> f a
variableA _ (Constant x) = pure x
variableA f (Variable s) = f s
variable :: (String -> Integer) -> Variable a -> a
variable _ (Constant x) = x
variable f (Variable s) = f s
replace :: Map String Integer -> WithVars -> Integer
replace m = runAp (variable (m Map.!)) . runAppNum
replace (Map.fromList [("z",2), ("y",3)]) (x + y + z)
-- 6
```

This will fail if a free variable isn’t present in the map, unfortunately. To fix it, we *could* use `Either`

instead of `Identity`

:

```
replace :: Map String Integer -> WithVars -> Either String Integer
replace m =
runAp
(variableA $
\s ->
maybe (Left s) Right (Map.lookup s m)) .
runAppNum
```

But this only gives us the first missing variable encountered. We’d like to get back *all* of the missing variables, ideally: accumulating the `Left`

s. `Either`

doesn’t accumulate values, as if it did it would break the monad laws.

There’s no issue with the *applicative* laws, though, which is why the validation package provides a *non-monadic* either-like type, which we can use here.

```
replace :: Map String Integer -> WithVars -> AccValidation [String] Integer
replace m =
runAp
(variableA $
\s ->
maybe (AccFailure [s]) pure (Map.lookup s m)) .
runAppNum
replace (Map.fromList []) (x + y + z)
-- AccFailure ["y","z"]
```

There are a bunch more applicatives you could use instead of `Either`

. Using lists, for instance, you could calculate the possible outcomes from a range of inputs:

```
range :: WithVars -> [Integer]
range = runAp (variable (const [1..3])) . runAppNum
range (x + y + z)
-- [3,4,5,4,5,6,5,6,7]
```

Or you could ask the user for input:

```
query :: WithVars -> IO Integer
query = runAp (variable f) . runAppNum
where
f s = do
putStr "Input a value for "
putStrLn s
fmap read getLine
```

Finally, and this one’s a bit exotic, you could examine every variable in turn, with defaults for the others:

```
zygo
:: (forall x. f x -> x)
-> (forall x. f x -> (x -> a) -> b)
-> Ap f a
-> [b]
zygo (l :: forall x. f x -> x) (c :: forall x. f x -> (x -> a) -> b) =
fst . go id
where
go :: forall c. (c -> a) -> Ap f c -> ([b], c)
go _ (Pure x) = ([], x)
go k (Ap x f) = (c x (k . ls) : xs, ls lx)
where
(xs,ls) = go (k . ($ lx)) f
lx = l x
examineEach :: WithVars -> [Integer -> Integer]
examineEach = zygo (variable (const 1)) g . runAppNum
where
g :: Variable a -> (a -> b) -> Integer -> b
g (Constant x) rhs _ = rhs x
g (Variable _) rhs i = rhs i
```

This produces a list of functions which are equivalent to subbing in for each variable with the rest set to 1.

]]>A while ago I read this post on reddit (by David Feuer), about sorting traversables (which was a follow-up on this post by Will Fancher), and I was inspired to write some pseudo-dependently-typed Haskell. The post (and subsequent library) detailed how to use size-indexed heaps to perform fast, total sorting on any traversable. I ended up with a library which has five size-indexed heaps (Braun, pairing, binomial, skew, and leftist), each verified for structural correctness. I also included the non-indexed implementations of each for comparison (as well as benchmarks, tests, and all that good stuff).

The purpose of this post is to go through some of the tricks I used and problems I encountered writing a lot of type-level code in modern Haskell.

In order to index things by their size, we’ll need a type-level representation of size. We’ll use Peano numbers for now:

`Z`

stands for zero, and `S`

for successor. The terseness is pretty necessary here, unfortunately: arithmetic becomes unreadable otherwise. The simplicity of this definition is useful for proofs and manipulation; however any runtime representation of these numbers is going to be woefully slow.

With the `DataKinds`

extension, the above is automatically promoted to the type-level, so we can write type-level functions (type families) on the `Peano`

type:

Here the `TypeFamilies`

extension is needed. I’ll try and mention every extension I’m using as we go, but I might forget a few, so check the repository for all of the examples (quick aside: I *did* manage to avoid using `UndecidableInstances`

, but more on that later). One pragma that’s worth mentioning is:

This suppresses warnings on the definition of `Plus`

above. Without it, GHC would want us to write:

```
type family Plus (n :: Peano) (m :: Peano) :: Peano where
Plus 'Z m = m
Plus ('S n) m = 'S (Plus n m)
```

I think that looks pretty ugly, and it can get much worse with more involved arithmetic. The only thing I have found the warnings useful for is `[]`

: the type-level empty list gives an error in its unticked form.

In the original post, a pairing heap (Fredman et al. 1986) was used, for its simplicity and performance. The implementation looked like this:

```
data Heap n a where
E :: Heap Z a
T :: a -> HVec n a -> Heap (S n) a
data HVec n a where
HNil :: HVec Z a
HCons :: Heap m a -> HVec n a -> HVec (Plus m n) a
```

You immediately run into trouble when you try to define merge:

```
merge :: Ord a => Heap m a -> Heap n a -> Heap (Plus m n) a
merge E ys = ys
merge xs E = xs
merge h1@(T x xs) h2@(T y ys)
| x <= y = T x (HCons h2 xs)
| otherwise = T y (HCons h1 ys)
```

Three errors show up here, but we’ll look at the first one:

`Could not deduce (m ~ (Plus m Z))`

GHC doesn’t know that $x = x + 0$. Somehow, we’ll have to *prove* that it does.

In a language with true dependent types, proving the proposition above is as simple as:

```
plusZeroNeutral : (n : Nat) -> n + 0 = n
plusZeroNeutral Z = Refl
plusZeroNeutral (S k) = cong (plusZeroNeutral k)
```

(this example is in Idris)

In Haskell, on the other hand, we can’t do the same: functions on the value-level `Peano`

have no relationship with functions on the type-level `Peano`

. There’s no way to automatically link or promote one to the other.

This is where singletons come in (Eisenberg and Weirich 2012). A singleton is a datatype which mirrors a type-level value exactly, except that it has a type parameter which matches the equivalent value on the type-level. In this way, we can write functions on the value-level which are linked to the type-level. Here’s a potential singleton for `Peano`

:

(we need `GADTs`

for this example)

Now, when we pattern-match on `Natty`

, we get a proof of whatever its type parameter was. Here’s a trivial example:

When we match on `Zy`

, the *only value* which `n`

could have been is `Z`

, because the only way to construct `Zy`

is if the type parameter is `Z`

.

Using this technique, the `plusZeroNeutral`

proof looks reasonably similar to the Idris version:

```
plusZeroNeutral :: Natty n -> Plus n Z :~: n
plusZeroNeutral Zy = Refl
plusZeroNeutral (Sy n) = case plusZeroNeutral n of
Refl -> Refl
```

To generalize the singletons a little, we could probably use the singletons library, or we could roll our own:

```
data family The k :: k -> Type
data instance The Peano n where
Zy :: The Peano Z
Sy :: The Peano n -> The Peano (S n)
plusZeroNeutral :: The Peano n -> Plus n Z :~: n
plusZeroNeutral Zy = Refl
plusZeroNeutral (Sy n) = case plusZeroNeutral n of
Refl -> Refl
```

The `The`

naming is kind of cute, I think. It makes the signature look *almost* like the Idris version (`the`

is a function from the Idris standard library). The `The`

type family requires the `TypeInType`

extension, which I’ll talk a little more about later.

There’s an issue with these kinds of proofs: the proof code runs *every time* it is needed. Since the same value is coming out the other end each time (`Refl`

), this seems wasteful.

In a language like Idris, this problem is avoided by noticing that you’re only using the proof for its type information, and then erasing it at runtime. In Haskell, we can accomplish the same with a rule:

```
{-# NOINLINE plusZeroNeutral #-}
{-# RULES
"plusZeroNeutral" forall x. plusZeroNeutral x
= unsafeCoerce (Refl :: 'Z :~: 'Z)
#-}
```

This basically says “if this type-checks, then the proof must exist, and therefore the proof must be valid. So don’t bother running it”. Unfortunately, that’s a *little bit* of a lie. It’s pretty easy to write a proof which type-checks that *isn’t* valid:

We won’t be able to perform computations which rely on this proof in Haskell, though: because the computation will never terminate, the proof will never provide an answer. This means that, while the proof isn’t valid, it *is* type safe. That is, of course, unless we use our manual proof-erasure technique. The `RULES`

pragma will happily replace it with the `unsafeCoerce`

version, effectively introducing unsoundness into our proofs. The reason that this doesn’t cause a problem for language like Idris is that Idris has a totality checker: you *can’t* write the above definition (with the totality checker turned on) in Idris.

So what’s the solution? Do we have to suffer through the slower proof code to maintain correctness? In reality, it’s usually OK to assume termination. It’s pretty easy to see that a proof like `plusZeroNeutral`

is total. It’s worth bearing in mind, though, that until Haskell gets a totality checker (likely never, apparently) these proofs aren’t “proper”.

One extra thing: while you’re proving things in one area of your code, you might not have the relevant singleton handy. To generate them on-demand, you’ll need a typeclass:

```
class KnownSing (x :: k) where
sing :: The k x
instance KnownSing Z where
sing = Zy
instance KnownSing n => KnownSing (S n) where
sing = Sy sing
```

This kind of drives home the inefficiency of singleton-based proofs, and why it’s important to erase them aggressively.

One other way to solve these problems is to try find a data structure which runs the proof code anyway. As an example, consider a length-indexed list:

You might worry that concatenation of two lists requires some expensive proof code, like `merge`

for the pairing heap. Maybe surprisingly, the default implementation just works:

```
infixr 5 ++
(++) :: List n a -> List m a -> List (Plus n m) a
(++) Nil ys = ys
(++) (x :- xs) ys = x :- xs ++ ys
```

Why? Well, if you look back to the definition of `Plus`

, it’s almost exactly the same as the definition of `(++)`

. In effect, we’re using *lists* as the singleton for `Peano`

here.

The question is, then: is there a heap which performs these proofs automatically for functions like merge? As far as I can tell: *almost*. First though:

The standard definition of `++`

on normal lists can be cleaned up a little with `foldr`

Can we get a similar definition for our length-indexed lists? Turns out we can, but the type of `foldr`

needs to be a little different:

```
foldrList :: (forall x. a -> b x -> b (S x))
-> b m -> List n a -> b (n + m)
foldrList f b Nil = b
foldrList f b (x :- xs) = f x (foldrList f b xs)
newtype Flip (f :: t -> u -> Type) (a :: u) (b :: t)
= Flip { unFlip :: f b a }
foldrList1 :: (forall x. a -> b x c -> b (S x) c)
-> b m c -> List n a -> b (n + m) c
foldrList1 f b
= unFlip . foldrList (\e -> Flip . f e . unFlip) (Flip b)
infixr 5 ++
(++) :: List n a -> List m a -> List (n + m) a
(++) = flip (foldrList1 (:-))
```

So what’s the point of this more complicated version? Well, if this were normal Haskell, we might get some foldr-fusion or something (in reality we would probably use `augment`

if that were the purpose).

With this type-level business, though, there’s a similar application: loop unrolling. Consider the natural-number type again. We can write a typeclass which will perform induction over them:

```
class KnownPeano (n :: Peano) where
unrollRepeat :: Proxy n -> (a -> a) -> a -> a
instance KnownPeano Z where
unrollRepeat _ = const id
{-# INLINE unrollRepeat #-}
instance KnownPeano n =>
KnownPeano (S n) where
unrollRepeat (_ :: Proxy (S n)) f x =
f (unrollRepeat (Proxy :: Proxy n) f x)
{-# INLINE unrollRepeat #-}
```

Because the recursion here calls a different `unrollRepeat`

function in the “recursive” call, we get around the usual hurdle of not being able to inline recursive calls. That means that the whole loop will be unrolled, at compile-time. We can do the same for foldr:

```
class HasFoldr (n :: Peano) where
unrollFoldr
:: (forall x. a -> b x -> b (S x))
-> b m
-> List n a
-> b (n + m)
instance HasFoldr Z where
unrollFoldr _ b _ = b
{-# INLINE unrollFoldr #-}
instance HasFoldr n => HasFoldr (S n) where
unrollFoldr f b (x :- xs) = f x (unrollFoldr f b xs)
{-# INLINE unrollFoldr #-}
```

I can’t think of many uses for this technique, but one that comes to mind is an n-ary uncurry (like Lisp’s apply):

```
infixr 5 :-
data List (xs :: [*]) where
Nil :: List '[]
(:-) :: a -> List xs -> List (a ': xs)
class KnownList (xs :: [*]) where
foldrT
:: (forall y ys. y -> result ys -> result (y ': ys))
-> result '[]
-> List xs
-> result xs
instance KnownList ('[] :: [*]) where
foldrT _ = const
{-# INLINE foldrT #-}
instance KnownList xs =>
KnownList (x ': xs) where
foldrT f b (x :- xs) = f x (foldrT f b xs)
{-# INLINE foldrT #-}
type family Func (xs :: [*]) (y :: *) where
Func '[] y = y
Func (x ': xs) y = x -> Func xs y
newtype FunType y xs = FunType
{ runFun :: Func xs y -> y
}
uncurry
:: KnownList xs
=> Func xs y -> List xs -> y
uncurry f l =
runFun
(foldrT
(c (\x g h -> g (h x)))
(FunType id)
l)
f
where
c :: (a -> ((Func xs y -> y) -> (Func zs z -> z)))
-> (a -> (FunType y xs -> FunType z zs))
c = coerce
{-# INLINE c #-}
{-# INLINE uncurry #-}
```

I *think* that you can be guaranteed the above is inlined at compile-time, making it essentially equivalent to a handwritten `uncurry`

.

Anyway, back to the size-indexed heaps. The reason that `(++)`

worked so easily on lists is that a list can be thought of as the data-structure equivalent to Peano numbers. Another numeric-system-based data structure is the binomial heap, which is based on binary numbering (I’m going mainly off of the description from Hinze 1999).

So, to work with binary numbers, let’s get some preliminaries on the type-level out of the way:

```
data instance The Bool x where
Falsy :: The Bool False
Truey :: The Bool True
data instance The [k] xs where
Nily :: The [k] '[]
Cony :: The k x -> The [k] xs -> The [k] (x : xs)
instance KnownSing True where
sing = Truey
instance KnownSing False where
sing = Falsy
instance KnownSing '[] where
sing = Nily
instance (KnownSing xs, KnownSing x) =>
KnownSing (x : xs) where
sing = Cony sing sing
```

We’ll represent a binary number as a list of Booleans:

```
type family Sum (x :: Bool) (y :: Bool) (cin :: Bool) :: Bool where
Sum False False False = False
Sum False False True = True
Sum False True False = True
Sum False True True = False
Sum True False False = True
Sum True False True = False
Sum True True False = False
Sum True True True = True
type family Carry (x :: Bool) (y :: Bool) (cin :: Bool)
(xs :: [Bool]) (ys :: [Bool]) :: [Bool] where
Carry False False False xs ys = Add False xs ys
Carry False False True xs ys = Add False xs ys
Carry False True False xs ys = Add False xs ys
Carry False True True xs ys = Add True xs ys
Carry True False False xs ys = Add False xs ys
Carry True False True xs ys = Add True xs ys
Carry True True False xs ys = Add True xs ys
Carry True True True xs ys = Add True xs ys
type family Add (cin :: Bool) (xs :: [Bool]) (ys :: [Bool]) ::
[Bool] where
Add c (x : xs) (y : ys) = Sum x y c : Carry x y c xs ys
Add False '[] ys = ys
Add False xs '[] = xs
Add True '[] ys = CarryOne ys
Add True xs '[] = CarryOne xs
type family CarryOne (xs :: [Bool]) :: [Bool] where
CarryOne '[] = True : '[]
CarryOne (False : xs) = True : xs
CarryOne (True : xs) = False : CarryOne xs
```

The odd definition of `Carry`

is to avoid `UndecidableInstances`

: if we had written, instead:

```
type family Carry (x :: Bool) (y :: Bool) (cin :: Bool) :: Bool where
Carry False False False = False
Carry False False True = False
Carry False True False = False
Carry False True True = True
Carry True False False = False
Carry True False True = True
Carry True True False = True
Carry True True True = True
type family Add (cin :: Bool) (xs :: [Bool]) (ys :: [Bool]) ::
[Bool] where
Add c (x : xs) (y : ys) = Sum x y c : Add (Carry x y c) xs ys
Add False '[] ys = ys
Add False xs '[] = xs
Add True '[] ys = CarryOne ys
Add True xs '[] = CarryOne xs
```

We would have been warned about nested type-family application.

Now we can base the merge function very closely on these type families. First, though, we’ll have to implement the heap.

There are different potential properties you can verify in a data structure. In the sort-traversable post, the property of interest was that the number of elements in the structure would stay the same after adding and removing some number $n$ of elements. For this post, we’ll also verify structural invariants. I won’t, however, verify the heap property. Maybe in a later post.

When indexing a data structure by its size, you encode an awful lot of information into the type signature: the type becomes very *specific* to the structure in question. It is possible, though, to encode a fair few structural invariants *without* getting so specific. Here’s a signature for “perfect leaf tree”:

With that signature, it’s *impossible* to create a tree with more elements in its left branch than its right; the size of the tree, however, remains unspecified. You can use a similar trick to implement matrices which must be square (from Okasaki 1999): the usual trick (`type Matrix n a = List n (List n a)`

) is too specific, providing size information at compile-time. If you’re interested in this approach, there are several more examples in Hinze (2001).

It is possible to go from the size-indexed version back to the non-indexed version, with an existential (`RankNTypes`

for this example):

This will let you prove invariants in your implementation using an index, while keeping the user-facing type signature general and non-indexed.

Wasserman (2010), was able to encode all of the structural invariants of the binomial heap *without* indexing by its size (well, all invariants except truncation, which turned out to be important a little later). I’ll be using a similar approach, except I’ll leverage some of the newer bells and whistles in GHC. Where Wasserman’s version used types like this for the numbering:

We can reuse the type-level Peano numbers with a GADT:

```
infixr 5 :-
data Binomial xs rk a where
Nil :: Binomial '[] n a
Skip :: Binomial xs (S rk) a -> Binomial (False : xs) rk a
(:-) :: Tree rk a
-> Binomial xs (S rk) a
-> Binomial (True : xs) rk a
data Tree rk a = Root a (Node rk a)
infixr 5 :<
data Node n a where
NilN :: Node Z a
(:<) :: Tree n a -> Node n a -> Node (S n) a
```

The definition of `Tree`

here ensures that any tree of rank $n$ has $2^n$ elements. The binomial heap, then, is a list of trees, in ascending order of size, with a `True`

at every point in its type-level list where a tree is present, and a `False`

wherever one is absent. In other words, the type-level list is a binary encoding of the number of elements it contains.

And here are the merge functions:

```
mergeTree :: Ord a => Tree rk a -> Tree rk a -> Tree (S rk) a
mergeTree xr@(Root x xs) yr@(Root y ys)
| x <= y = Root x (yr :< xs)
| otherwise = Root y (xr :< ys)
merge
:: Ord a
=> Binomial xs z a
-> Binomial ys z a
-> Binomial (Add False xs ys) z a
merge Nil ys = ys
merge xs Nil = xs
merge (Skip xs) (Skip ys) = Skip (merge xs ys)
merge (Skip xs) (y :- ys) = y :- merge xs ys
merge (x :- xs) (Skip ys) = x :- merge xs ys
merge (x :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys)
mergeCarry
:: Ord a
=> Tree rk a
-> Binomial xs rk a
-> Binomial ys rk a
-> Binomial (Add True xs ys) rk a
mergeCarry t Nil ys = carryOne t ys
mergeCarry t xs Nil = carryOne t xs
mergeCarry t (Skip xs) (Skip ys) = t :- merge xs ys
mergeCarry t (Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys)
mergeCarry t (x :- xs) (Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys)
mergeCarry t (x :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys
carryOne
:: Ord a
=> Tree rk a -> Binomial xs rk a -> Binomial (CarryOne xs) rk a
carryOne t Nil = t :- Nil
carryOne t (Skip xs) = t :- xs
carryOne t (x :- xs) = Skip (carryOne (mergeTree t x) xs)
```

You’ll notice that no proofs are needed: that’s because the merge function itself is the same as the type family, like the way `++`

for lists was the same as the `Plus`

type family.

Of course, this structure is only verified insofar as you believe the type families. It does provide a degree of double-entry, though: any mistake in the type family will have to be mirrored in the merge function to type-check. On top of that, we can write some proofs of properties we might expect:

```
addCommutes
:: The [Bool] xs
-> The [Bool] ys
-> Add False xs ys :~: Add False ys xs
addCommutes Nily _ = Refl
addCommutes _ Nily = Refl
addCommutes (Cony Falsy xs) (Cony Falsy ys) =
gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Truey xs) (Cony Falsy ys) =
gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Falsy xs) (Cony Truey ys) =
gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Truey xs) (Cony Truey ys) =
gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry
:: The [Bool] xs
-> The [Bool] ys
-> Add True xs ys :~: Add True ys xs
addCommutesCarry Nily _ = Refl
addCommutesCarry _ Nily = Refl
addCommutesCarry (Cony Falsy xs) (Cony Falsy ys) =
gcastWith (addCommutes xs ys) Refl
addCommutesCarry (Cony Truey xs) (Cony Falsy ys) =
gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry (Cony Falsy xs) (Cony Truey ys) =
gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry (Cony Truey xs) (Cony Truey ys) =
gcastWith (addCommutesCarry xs ys) Refl
```

Unfortunately, though, this method *does* require proofs (ugly proofs) for the delete-min operation. One of the issues is truncation: since the binary digits are stored least-significant-bit first, the same number can be represented with any number of trailing zeroes. This kept causing problems for me when it came to subtraction, and adding the requirement of no trailing zeroes (truncation) to the constructors for the heap was a pain, requiring extra proofs on merge to show that it preserves truncation.

Since some of these properties are much easier to verify on the type-level Peano numbers, one approach might be to convert back and forth between Peano numbers and binary, and use the proofs on Peano numbers instead.

```
type family BintoPeano (xs :: [Bool]) :: Peano where
BintoPeano '[] = Z
BintoPeano (False : xs) = BintoPeano xs + BintoPeano xs
BintoPeano (True : xs) = S (BintoPeano xs + BintoPeano xs)
```

First problem: this requires `UndecidableInstances`

. I’d *really* rather not have that turned on, to be honest. In Idris (and Agda), you can *prove* decidability using a number of different methods, but this isn’t available in Haskell yet.

Regardless, we can push on.

To go in the other direction, we’ll need to calculate the parity of natural numbers. Taken from the Idris tutorial:

```
data Parity (n :: Peano) where
Even :: The Peano n -> Parity (n + n)
Odd :: The Peano n -> Parity (S (n + n))
parity :: The Peano n -> Parity n
parity Zy = Even Zy
parity (Sy Zy) = Odd Zy
parity (Sy (Sy n)) = case parity n of
Even m -> gcastWith (plusSuccDistrib m m) (Even (Sy m))
Odd m -> gcastWith (plusSuccDistrib m m) (Odd (Sy m))
plusSuccDistrib :: The Peano n -> proxy m -> n + S m :~: S (n + m)
plusSuccDistrib Zy _ = Refl
plusSuccDistrib (Sy n) p = gcastWith (plusSuccDistrib n p) Refl
```

We need this function on the type-level, though, not the value-level: here, again, we run into trouble. What does `gcastWith`

look like on the type-level? As far as I can tell, it doesn’t exist (yet. Although I haven’t looked deeply into the singletons library yet).

This idea of doing dependently-typed stuff on the type-level *started* to be possible with `TypeInType`

. For instance, we could have defined our binary type as: