Tags: Agda

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

]]>
Tags: Haskell

Here’s a cool trick:

```
minimum :: Ord a => [a] -> a
minimum = head . sort
```

This is $\mathcal{O}(n)$ in Haskell, not $\mathcal{O}(n \log n)$ as you might expect. And this isn’t because Haskell is using some weird linear-time sorting algorithm; indeed, the following is $\mathcal{O}(n \log n)$:

```
maximum :: Ord a => [a] -> a
maximum = last . sort
```

No: since the implementation of `minimum`

above only
demands the first element of the list, and since `sort`

has
been carefully implemented, only a linear amount of work will be done to
retrieve it.

It’s not easy to structure programs to have the same property as
`sort`

does above: to be maximally lazy, such that
unnecessary work is not performed. Today I was working on a maximally
lazy implementation of the following program:

```
groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
= ...
groupOn
>>> groupOn (`rem` 2) [1..5]
1,[1,3,5]),(0,[2,4])]
[(
>>> groupOn (`rem` 3) [5,8,3,6,2]
2,[5,8,2]),(0,[3,6])] [(
```

This function groups the elements of a list according to some key function. The desired behaviour here is a little subtle: we don’t want to just group adjacent elements, for instance.

``rem` 3) [5,8,3,6,2] ≢ [(2,[5,8]),(0,[3,6]),(2,[2])] groupOn (`

And we don’t want to reorder the elements of the list by the keys:

``rem` 3) [5,8,3,6,2] ≢ [(0,[3,6]),(2,[5,8,2])] groupOn (`

These constraints make it especially tricky to make this function
lazy. In fact, at first glance, it seems impossible. What should, for
instance, `groupOn id [1..]`

return? It can’t even fill out
the first group, since it will never find another `1`

.
However, it *can* fill out the first key. And, in fact, the
second. And it can fill out the first element of the first group.
Precisely:

`id [1..] ≡ [(1,1:⊥), (2,2:⊥), (3,3:⊥), ... groupOn `

Another example is `groupOn id (repeat 1)`

, or
`groupOn id (cycle [1,2,3])`

. These each have
partially-defined answers:

```
id (repeat 1) ≡ (1,repeat 1):⊥
groupOn
id (cycle [1,2,3]) ≡ (1,repeat 1):(2,repeat 2):(3,repeat 3):⊥ groupOn
```

So there is some kind of well-defined lazy semantics for this function. The puzzle I was interested in was defining an efficient implementation for these semantics.

The first approximation to a solution I could think of is the following:

```
groupOn :: Ord k => (a -> k) -> [a] -> [(k, [a])]
= Map.toList . Map.fromListWith (++) . map (\x -> (k x, [x])) groupOn k
```

In fact, if you don’t care about laziness, this is probably the best solution: it’s $\mathcal{O}(n \log n)$, it performs well (practically as well as asymptotically), and it has the expected results.

However, there are problems. Primarily this solution cares about ordering, which we don’t want. We want to emit the results in the same order that they were in the original list, and we don’t necessarily want to require an ordering on the elements (for the efficient solution we will relax this last constraint).

Instead, let’s implement our own “map” type that is inefficient, but more general.

```
type Map a b = [(a,b)]
insertWith :: Eq a => (b -> b -> b) -> a -> b -> Map a b -> Map a b
= [(k,v)]
insertWith f k v [] :xs)
insertWith f k v ((k',v')| k == k' = (k',f v v') : xs
| otherwise = (k',v') : insertWith f k v xs
groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])]
= foldr (uncurry (insertWith (++))) [] . map (\x -> (k x, [x])) groupOn k
```

The problem here is that it’s not lazy enough.
`insertWith`

is strict in its last argument, which means that
using `foldr`

doesn’t gain us anything laziness-wise.

There is some extra information we can use to drive the result: we know that the result will have keys that are in the same order as they appear in the list, with duplicates removed:

```
groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])]
= map _ ks
groupOn k xs where
= map k xs ks
```

From here, we can get what the values should be from each key by filtering the original list:

```
groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
= map (\k -> (k, filter ((k==) . key) xs)) (nub (map key xs)) groupOn key xs
```

Using a kind of Schwartzian transform yields the following slight improvement:

```
groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
= map (\k -> (k , map snd (filter ((k==) . fst) ks))) (nub (map fst ks))
groupOn key xs where
= map (\x -> (key x, x)) xs ks
```

But this traverses the same list multiple times unnecessarily. The
problem is that we’re repeating a lot of work between `nub`

and the rest of the algorithm.

The following is much better:

```
groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
= go . map (\x -> (key x, x))
groupOn key where
= []
go [] :xs) = (k,x:map snd y) : go ys
go ((k,x)where
= partition ((k==).fst) xs (y,ys)
```

First, we perform the Schwartzian transform optimisation. The work of
the algorithm is done in the `go`

helper. The idea is to
filter out duplicates as we encounter them: when we encounter
`(k,x)`

we can keep it immediately, but then we split the
rest of the list into the components that have the same key as this
element, and the ones that differ. The ones that have the same key can
form the collection for this key, and those that differ are what we
recurse on.

This partitioning also avoids re-traversing elements we know to be
already accounted for in a previous group. I think that this is the most
efficient (modulo some inlining and strictness improvements) algorithm
that can do `groupOn`

with just an `Eq`

constraint.

The reason that the `groupOn`

above is slow is that every
element returned has to traverse the entire rest of the list to remove
duplicates. This is a classic pattern of quadratic behaviour: we can
improve it by using the same trick as quick sort, by partitioning the
list into lesser and greater elements on every call.

```
groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
= go . map (\x -> (key x, x))
groupOnOrd key where
= []
go [] :xs) = (k,x:e) : go lt ++ go gt
go ((k,x)where
= foldr split ([],[],[]) xs
(e,lt,gt) @(k',y) ~(e,lt,gt) = case compare k' k of
split kyLT -> (e, ky:lt, gt)
EQ -> (y:e, lt, gt)
GT -> (e, lt, ky:gt)
```

While this is $\mathcal{O}(n \log n)$, and it does group elements, it also reorders the underlying list. Let’s fix that by tagging the incoming elements with their positions, and then using those positions to order them back into their original configuration:

```
groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
= map (\(_,k,xs) -> (k,xs)) . go . zipWith (\i x -> (i, k x, x)) [0..]
groupOnOrd k where
= []
go [] :xs) = (i, k, x : e) : merge (go l) (go g)
go ((i, k, x)where
= foldr split ([],[],[]) xs
(e, l, g)
@(_,k',y) ~(e, l, g) = case compare k' k of
split kyLT -> (e , ky : l, g)
EQ -> (y:e, l, g)
GT -> (e , l, ky : g)
= gt
merge [] gt = lt
merge lt [] @(i,_,_):lt) (g@(j,_,_):gt)
merge (l| i <= j = l : merge lt (g:gt)
| otherwise = g : merge (l:lt) gt
```

This is close, but still not right. This isn’t yet *lazy*. The
`merge`

function is strict in both arguments.

However, we have all the information we need to unshuffle the lists
without having to inspect them. In `split`

, we know which
direction we put each element: we can store that info without using
indices.

```
groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
= catMaybes . go . map (\x -> (k x, x))
groupOnOrd k where
= []
go [] :xs) = Just (k, x : e) : merge m (go l) (go g)
go ((k,x)where
= foldr split ([],[],[],[]) xs
(e, m, l, g)
@(k',y) ~(e, m, l, g) = case compare k' k of
split kyLT -> ( e, LT : m, ky : l, g)
EQ -> (y:e, EQ : m, l, g)
GT -> ( e, GT : m, l, ky : g)
= []
merge [] lt gt EQ : xs) lt gt = Nothing : merge xs lt gt
merge (LT : xs) (l:lt) gt = l : merge xs lt gt
merge (GT : xs) lt (g:gt) = g : merge xs lt gt merge (
```

What we generate here is a `[Ordering]`

: this list tells
us the result of all the compare operations on the input list. Then, in
`merge`

, we invert the action of `split`

,
rebuilding the original list without inspecting either `lt`

or `gt`

.

And this solution works! It’s $\mathcal{O}(n \log n)$, and fully lazy.

```
>>> map fst . groupOnOrd id $ [1..]
1..]
[
>>> groupOnOrd id $ cycle [1,2,3]
1,repeat 1):(2,repeat 2):(3,repeat 3):⊥
(
>>> groupOnOrd (`rem` 3) [1..]
1,[1,4..]):(2,[2,5..]):(0,[3,6..]):⊥ (
```

The finished version of these two functions, along with some benchmarks, is available here.

]]>
Tags: Agda

I haven’t written much on this blog recently: since starting a PhD all of my writing output has gone towards paper drafts and similar things. Recently, though, I’ve been thinking about streams, monoids, and comonads and I haven’t manage to wrangle those thoughts into something coherent enough for a paper. This blog post is a collection of those (pretty disorganised) thoughts. The hope is that writing them down will force me to clarify things, but consider this a warning that the rest of this post may well be muddled and confusing.

The first thing I want to talk about is streams.

record Stream (A : Type) : Type where coinductive field head : A tail : Stream A

This representation is *coinductive*: the type above contains
infinite values. Agda, unlike Haskell, treats inductive and coinductive
types differently (this is why we need the `coinductive`

keyword in the definition). One of the differences is that it doesn’t
check termination for construction of these values:

alternating : Stream Bool alternating .head = true alternating .tail .head = false alternating .tail .tail = alternating

```
alternating :: [Bool]
= True : False : alternating alternating
```

We have the equivalent in Haskell on the right. We’re also using some fancy syntax for the Agda code: copatterns (Abel and Pientka 2013).

Note that this type is only definable in a language with some notion
of laziness. If we tried to define a value like `alternating`

above in OCaml we would loop. Haskell has no problem, and Agda—through
its coinduction mechanism—can handle it as well.

Update 4-5-22: thanks to Arnaud Spiwack (@aspiwack)
for correcting me on this, it turns out the definition of
`alternating`

above *can* be written in Ocaml, even
without laziness. Apparently Ocaml has a facility for strict cyclic data
structures. Also, I should be a little more precise with what I’m saying
above: even without the extra facility for strict cycles, you can of
course write a lazy list with some kind of lazy wrapper type.

There is, however, an isomorphic type that can be defined without coinduction:

(notice that, in this form, the function `ℕ-alternating`

is the same function as `even : ℕ → Bool`

)

In fact, we can convert from the coinductive representation to the inductive one. This conversion function is more familiarly recognisable as the indexing function:

_[_] : Stream A → ℕ-Stream A xs [ zero ] = xs .head xs [ suc n ] = xs .tail [ n ]

I’m not just handwaving when I say the two representations are isomorphic: we can prove this isomorphism, and, in Cubical Agda, we can use this to transport programs on one representation to the other.

tabulate : ℕ-Stream A → Stream A tabulate xs .head = xs zero tabulate xs .tail = tabulate (xs ∘ suc) stream-rinv : (xs : Stream A) → tabulate (xs [_]) ≡ xs stream-rinv xs i .head = xs .head stream-rinv xs i .tail = stream-rinv (xs .tail) i stream-linv : (xs : ℕ-Stream A) (n : ℕ) → tabulate xs [ n ] ≡ xs n stream-linv xs zero = refl stream-linv xs (suc n) = stream-linv (xs ∘ suc) n stream-reps : ℕ-Stream A ⇔ Stream A stream-reps .fun = tabulate stream-reps .inv = _[_] stream-reps .rightInv = stream-rinv stream-reps .leftInv xs = funExt (stream-linv xs)

One final observation about streams: another way to define a stream is as the cofree comonad of the identity functor.

record Cofree (F : Type → Type) (A : Type) : Type where coinductive field root : A step : F (Cofree F A) 𝒞-Stream : Type → Type 𝒞-Stream = Cofree id

Concretely, the `Cofree F A`

type is a possibly infinite
tree, with branches shaped like `F`

, and internal nodes
labelled with `A`

. It has the following characteristic
function:

{-# NON_TERMINATING #-} trace : ⦃ _ : Functor 𝐹 ⦄ → (A → B) → (A → 𝐹 A) → A → Cofree 𝐹 B trace ϕ ρ x .root = ϕ x trace ϕ ρ x .step = map (trace ϕ ρ) (ρ x)

Like how the free monad turns any functor into a monad, the cofree
comonad turns any functor into a comonad. Comonads are less popular and
widely-used than monads, as there are less well-known examples of them.
I have found it helpful to think about comonads through spatial
analogies. A lot of comonads can represent a kind of walk through some
space: the `extract`

operation tells you “what is immediately
here”, and the `duplicate`

operation tells you “what can I
see from each point”. For the stream, these two operations are inhabited
by `head`

and the following:

duplicate : Stream A → Stream (Stream A) duplicate xs .head = xs duplicate xs .tail = duplicate (xs .tail)

There were three key observations in the last section:

- Streams are coinductive. This requires a different termination checker in Agda, and a different evaluation model in strict languages.
- They have an isomorphic representation based on
*indexing*. This isomorphic representation doesn’t need coinduction or laziness. - They are a special case of the cofree comonad.

Going forward, we’re going to look at generalisations of streams, and we’re going to see what these observations mean in the contexts of the new generalisations.

The thing we’ll be generalising is the index of the stream.
Currently, streams are basically structures that assign a value to every
`ℕ`

: what does a stream of—for instance—rational numbers look
like? To drive the intuition for this generalisation let’s first look at
the comonad instance on the `ℕ-Stream`

type:

ℕ-extract : ℕ-Stream A → A ℕ-extract xs = xs zero ℕ-duplicate : ℕ-Stream A → ℕ-Stream (ℕ-Stream A) ℕ-duplicate xs zero = xs ℕ-duplicate xs (suc n) = ℕ-duplicate (xs ∘ suc) n

This is the same instance as is on the `Stream`

type,
transported along the isomorphism between the two types (we could have
transported the instance automatically, using `subst`

or
`transport`

; I have written it out here manually in full for
illustration purposes).

The `ℕ-duplicate`

method here can changed a little to
reveal something interesting:

ℕ-duplicate₂ : ℕ-Stream A → ℕ-Stream (ℕ-Stream A) ℕ-duplicate₂ xs zero m = xs m ℕ-duplicate₂ xs (suc n) m = ℕ-duplicate₂ (xs ∘ suc) n m ℕ-duplicate₃ : ℕ-Stream A → ℕ-Stream (ℕ-Stream A) ℕ-duplicate₃ xs n m = xs (go n m) where go : ℕ → ℕ → ℕ go zero m = m go (suc n) m = suc (go n m) ℕ-duplicate₄ : ℕ-Stream A → ℕ-Stream (ℕ-Stream A) ℕ-duplicate₄ xs n m = xs (n + m)

In other words, `duplicate`

basically adds indices.

There is something distinctly *monoidal* about what’s going on
here: taking the `(ℕ, +, 0)`

monoid as focus, the
`extract`

method above corresponds to the monoidal empty
element, and the `duplicate`

method corresponds to the binary
operator on monoids. In actual fact, there is a comonad for any function
from a monoid, often called the `Traced`

comonad.

Traced : Type → Type → Type Traced E A = E → A extractᵀ : ⦃ _ : Monoid E ⦄ → Traced E A → A extractᵀ xs = xs ε duplicateᵀ : ⦃ _ : Monoid E ⦄ → Traced E A → Traced E (Traced E A) duplicateᵀ xs e₁ e₂ = xs (e₁ ∙ e₂)

The second observation we made about streams was that they had an
isomorphic representation which didn’t need coinduction. What we can see
above, with `Traced`

, is a representation that *also*
doesn’t need coinduction. So what is the corresponding coinductive
representation? What does a generalised *reified* stream look
like?

So the first approach to reifying a function to a data structure is to simply represent the function as a list of pairs.

C-Traced : Type → Type → Type C-Traced E A = Stream (E × A)

This representation obviously isn’t ideal: it isn’t possible to
construct an isomorphism between `C-Traced`

and
`Traced`

. We can—kind of—go in one direction, but even that
function isn’t terminating:

{-# NON_TERMINATING #-} lookup-env : ⦃ _ : IsDiscrete E ⦄ → C-Traced E A → Traced E A lookup-env xs x = if does (x ≟ xs .head .fst) then xs .head .snd else lookup-env (xs .tail) x

I’m not too concerned with being fast and loose with termination and isomorphisms for the time being, though. At the moment, I’m just interested in exploring the relationship between streams and the indexing functions.

As a result, let’s try and push on this representation a little and
see if it’s possible to get something interesting and *almost*
isomorphic.

To get a slightly nicer representation we can exploit the monoid a
little bit. We can do this by storing *offsets* instead of the
absolute indices for each entry. The data structure I have in mind here
looks a little like this:

┏━━━━━━━━━━┳━━━━━━┳━━━━━━┉ ┃x ┃y ┃z ┉ ┡━━━━━━━━━━╇━━━━━━╇━━━━━━┉ ╵⇤a╌╌╌╌╌╌╌⇥╵⇤b╌╌╌⇥╵⇤c╌╌╌╌┈

Above is a stream containing the values `x`

,
`y`

, and `z`

. Instead of each value corresponding
to a single entry in the stream, however, they each correspond to a
*segment*. The value `x`

, for instance, labels the
first segment in the stream, which has a length given by `a`

.
`y`

labels the second segment, with length `b`

,
`z`

with length `c`

, and so on.

The `Traced`

version of the above structure might be
something like this:

```
str :: Traced m a
| i < a = x
str i | i < a + b = y
| i < a + b + c = z
| ...
```

So the index-value mapping is also segmented. The stream, in this way, is kind of like a ruler, where different values mark out different quantities along the ruler, and the index function takes in a quantity and tells you which entry in the ruler that quantity corresponds to.

In code, we might represent the above data structure with the following type:

record Segments (E : Type) (A : Type) : Type where field length : E label : A next : Segments E A open Segments

The question is, then, how do we convert *this* structure to
an `Traced`

representation?

We need some extra operations on the monoid in the segments in order
to enable this conversion to the `Traced`

representation. The
extra operations are encapsulated by the monus algebra: I wrote about
this in the paper I submitted with Nicolas Wu to ICFP last year (2021). It’s a
simple algebra on monoids which basically encapsulates monoids which are
ordered in a sensible way.

The basic idea is that we construct an order on monoids which says “x is smaller than y if there is some z that we can add to x to get to y”.

_≼_ : ⦃ _ : Monoid A ⦄ → A → A → Type _ x ≼ y = ∃ z × (y ≡ x ∙ z)

A monus is a monoid where we can extract that `z`

, when it
exists. On the monoid `(ℕ, +, 0)`

, for instance, this order
corresponds to the normal ordering on `ℕ`

.

Extracting the `z`

above corresponds to a kind of
difference operator:

_∸_ : ℕ → ℕ → ℕ x ∸ zero = x suc x ∸ suc y = x ∸ y _ ∸ _ = zero

This operator is sometimes called the monus. It is a kind of partial, or truncating, subtraction:

_ : 5 ∸ 2 ≡ 3 _ = refl _ : 2 ∸ 5 ≡ 0 _ = refl

And, indeed, this operator “extracts” the `z`

, when it
exists.

∸‿is-monus : ∀ x y → (x≼y : x ≼ y) → y ∸ x ≡ fst x≼y ∸‿is-monus zero _ (z , y≡0+z) = y≡0+z ∸‿is-monus (suc x) (suc y) (z , y≡x+z) = ∸‿is-monus x y (z , suc-inj y≡x+z) ∸‿is-monus (suc x) zero (z , 0≡x+z) = ⊥-elim (zero≢suc 0≡x+z)

Our definition of a monus is simple: a monus is anything where the
order ≼, sometimes called the “algebraic preorder”, is total and
antisymmetric. This is precisely what lets us write a function which
takes the `Segments`

type and converts it back to the
`Traced`

type.

{-# NON_TERMINATING #-} Segments→Traced : ⦃ _ : Monus E ⦄ → Segments E A → Traced E A Segments→Traced xs i with xs .length ≤? i ... | yes (j , i≡xsₗ∙j) = Segments→Traced (xs .next) j ... | no _ = xs .label

This function takes an index, and checks if that length is greater than or equal to the first segment in the stream of segments. If it is, then it continues searching through the rest of the segments with the index reduced by the size of that first segment. If not, then it returns the label of the first segment.

Taking the old example, we are basically converting to ∸ from +:

```
str :: Traced m a
| i < a = x
str i | i ∸ a < b = y
| i ∸ a ∸ b < c = z
| ...
```

The first issue here is that this definition is not terminating. That might seem an insurmountable problem at first—we are searching through an infinite stream, after all—but notice that there is one paremeter which is decreasing on each recursive call: the index. Well, it only decreases if the segment is non-zero: this can be enforced by changing the definition of the segments type:

record ℱ-Segments (E : Type) ⦃ _ : Monus E ⦄ (A : Type) : Type where coinductive field label : A length : E length≢ε : length ≢ ε next : ℱ-Segments E A open ℱ-Segments

This type allows us to write the following definition:

module _ ⦃ _ : Monus E ⦄ (wf : WellFounded _≺_) where wf-index : ℱ-Segments E A → (i : E) → Acc _≺_ i → A wf-index xs i a with xs .length ≤? i ... | no _ = xs .label wf-index xs i (acc wf) | yes (j , i≡xsₗ∙j) = wf-index (xs .next) j (wf j (xs .length , i≡xsₗ∙j ; comm _ _ , xs .length≢ε)) ℱ-Segments→Traced : ℱ-Segments E A → Traced E A ℱ-Segments→Traced xs i = wf-index xs i (wf i)

So the `ℱ-Segments`

type is interesting, but it only
really gives one side of the isomorphism. There is no way to write a
function `Traced E A → ℱ-Segments E A`

.

The problem is that there’s no way to get the “next” segment from a
function `E → A`

. We can find the label of the first segment,
by applying the function to `ε`

, but there’s no real way to
figure out the *size* of this segment. We can change
`Traced`

little to *provide* this size, though.

Ind : ∀ E → ⦃ _ : Monus E ⦄ → Type → Type Ind E A = E → A × Σ[ length ⦂ E ] × (length ≢ ε)

This new type will return a tuple consisting of the value indicated
by the supplied index, along with the distance to the next segment. For
instance, on the example stream given in the diagram earlier, supplying
an index `i`

that is bigger than `a`

but smaller
than `a + b`

, this function should return `y`

along with some `j`

such that `i + j ≡ a + b`

.
Diagrammatically:

╷⇤i╌╌╌╌╌╌╌╌⇥╷⇤j╌╌⇥╷ ┢━━━━━━━━┳━━┷━━━━━╈━━━━━━┉ ┃x ┃y ┃z ┉ ┡━━━━━━━━╇━━━━━━━━╇━━━━━━┉ ╵⇤a╌╌╌╌╌⇥╵⇤b╌╌╌╌╌⇥╵⇤c╌╌╌╌┈

This can be implemented in code like so:

module _ ⦃ _ : Monus E ⦄ where wf-ind : ℱ-Segments E A → (i : E) → Acc _≺_ i → A × ∃ length × (length ≢ ε) wf-ind xs i _ with xs .length ≤? i ... | no xsₗ≰i = let j , _ , j≢ε = <⇒≺ i (xs .length) xsₗ≰i in xs .label , j , j≢ε wf-ind xs i (acc wf) | yes (j , i≡xsₗ∙j) = wf-ind (xs .next) j (wf j (xs .length , i≡xsₗ∙j ; comm _ _ , xs .length≢ε)) ℱ-Segments→Ind : WellFounded _≺_ → ℱ-Segments E A → Ind E A ℱ-Segments→Ind wf xs i = wf-ind xs i (wf i)

Again, if the monus has finite descending chains, this function is terminating. And the nice thing about this is that it’s possible to write a function in the other direction:

Ind→ℱ-Segments : ⦃ _ : Monus E ⦄ → Ind E A → ℱ-Segments E A Ind→ℱ-Segments ind = let x , s , s≢ε = ind ε in λ where .label → x .length → s .length≢ε → s≢ε .next → Ind→ℱ-Segments (ind ∘ (s ∙_))

The problem here is that this isomorphism is only half correct. We
can prove that converting to `Ind`

and back is the identity,
but not the other direction. There are too many functions in
`Ind`

.

Nonetheless, it’s still interesting!

There is a comonad on state (Waern 2018; Kmett
2018) that is different from store. Notice that above the
`Ind`

type has the same type (almost) as
`State E A`

.

This is interesting in two ways: first, it gives some concrete, spatial intuition for what’s going on with the state comonad.

Second, it gives a kind of interesting *monad* instance on the
stream. If we apply the `Ind→ℱ-Segments`

function to the
implementation of `join`

on state, we *should* get a
`join`

on `ℱ-Segments`

. And we do!

First, we need to redefine `Ind`

to the following:

𝒜-Ind : ∀ E → ⦃ _ : Monus E ⦄ → Type → Type 𝒜-Ind E A = (i : E) → A × Σ[ length ⦂ E ] × (i ≺ length)

This is actually isomorphic to the previous definition, but we return the absolute value of the next segment, rather than the distance to the next segment.

𝒜-iso : ⦃ _ : Monus E ⦄ → 𝒜-Ind E A ⇔ Ind E A 𝒜-iso .fun xs i = let x , s , k , s≡i∙k , k≢ε = xs i in x , k , k≢ε 𝒜-iso .inv xs i = let x , s , s≢ε = xs i in x , i ∙ s , s , refl , s≢ε 𝒜-iso .rightInv _ = refl 𝒜-iso .leftInv xs p i = let x , s , k , s≡i∙k , k≢ε = xs i in x , s≡i∙k (~ p) , k , (λ q → s≡i∙k (~ p ∨ q)) , k≢ε

The implemention of `join`

on this type is the
following:

𝒜-join : ⦃ _ : Monus E ⦄ → 𝒜-Ind E (𝒜-Ind E A) → 𝒜-Ind E A 𝒜-join xs i = let x , j , i<j = xs i y , k , k<j = x j in y , k , ≺-trans i<j k<j

This is the same definition of `join`

as for
`State`

, modulo the `<`

fiddling.

On a stream, this operation corresponds to taking a stream of streams and collapsing it to a single stream. It does this by taking a prefix of each internal stream equal in size to the segment of the outer entry. Diagrammatically:

┏━━━━━━━━━━┳━━━━━━┳━━━━━━┉ ┃xs ┃ys ┃zs ┉ ┡━━━━━━━━━━╇━━━━━━╇━━━━━━┉ ╵⇤a╌╌╌╌╌╌╌⇥╵⇤b╌╌╌⇥╵⇤c╌╌╌╌┈ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╱ ╲ ╷⇤b╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌⇥╷ ┢━━━━━━━┳━━━━━━┳━━━┷┉ ys = ┃xʸ ┃yʸ ┃zʸ ┉ ┡━━━━━━━╇━━━━━━╇━━━━┉ ╵⇤aʸ╌╌╌⇥╵⇤bʸ╌╌⇥╵⇤cʸ╌┈

Here we start with a stream consisting of the streams
`xs`

, `ys`

, and `zs`

, followed by some
other streams. Zooming in on `ys`

, we see that it is in a
segment of length `b`

, and consists of three values
`xʸ`

, `yʸ`

, and `zʸ`

, with segment
lengths `aʸ`

, `bʸ`

, and `cʸ`

,
respectively.

Calling `join`

on this stream will give us the following
stream:

┏━┉━┳━━━━┳━━━━┳━━━━━┳━━━━┉ ┃ ┉ ┃xʸ ┃yʸ ┃zʸ ┃ ┉ ┡━┉━╇━━━━╇━━━━╇━━━━━╇━━━━┉ │ │⇤aʸ⇥╵⇤bʸ⇥╵⇤╌╌┈⇥│ ╵⇤a⇥╵⇤b╌╌╌╌╌╌╌╌╌╌╌╌⇥╵⇤c╌╌┈

Again, we’re focusing on the `ys`

section here, which
occupies the segment from `a`

to `a ∙ b`

. After
`join`

, this segment is occupied by three elements,
`xʸ`

, `yʸ`

, and `zʸ`

.

Notice that this isn’t quite the normal `join`

on streams.
That `join`

takes a stream of streams, and turns the
`i`

th entry into the `i`

th entry in the underlying
stream. It’s a diagonalisation, in other words.

This one is kind of similar, but it takes chunks of the outer stream.

All of this so far is very hand-wavy. We have an almost isomorphism (a split surjection, to be precise), but not much in the way of concrete theoretical insights, just some vague gesturing towards spatial metaphors and so on.

Thankfully, there are two seperate areas of more serious research that seem related to the stuff I’ve talked about here. The first is update monads and directed containers, and the second is graded comonads. I think I understand graded comonads and the related work better out of the two, but update monads and directed containers seems more closely related to what I’m doing here.

There are a few papers on this topic: Ahman, Chapman, and Uustalu (2012), Ahman and Uustalu (2013; Ahman and Uustalu 2014; Ahman and Uustalu 2016).

The first of these, “When Is a Container a Comonad?” constructs, as the title suggests, a class for containers which are comonads in a standard way.

Here’s the definition of a container:

Container : Type₁ Container = Σ[ Shape ⦂ Type ] × (Shape → Type) ⟦_⟧ : Container → Type → Type ⟦ S , P ⟧ X = Σ[ s ⦂ S ] × (P s → X)

Containers are a generic way to describe a class of well-behaved
functors. Any container is a pair of a shape and position. Lists, for
instance, are containers, where their shape is described by the natural
numbers (the shape here is the length of the list). The positions in
such a list are the numbers smaller than the length, in
dependently-typed programming we usually use the `Fin`

type
for this:

Fin : ℕ → Type Fin n = ∃ m × (m <ℕ n)

The container version of lists, then, is the following:

ℒ𝒾𝓈𝓉 : Type → Type ℒ𝒾𝓈𝓉 = ⟦ ℕ , Fin ⟧

Here’s the same list represented in the standard way, and as a container:

The benefit of using containers is that it gives a standard, generic, and composable way to construct functors that have some nice properties (like strict positivity). They’re pretty annoying to use in practice, though, which is a shame.

Directed containers are container that have three extra operations.

- A
`tail`

-like operation, where a position can be converted into the shape of containers that the suffic from that position. - A
`head`

-like operation, where you can always return the root position. - A
`+`

-like operation, where you take a position on some tail and translate it into a position on the original container, by adding it.

As the paper observes, these are very similar to a “dependently-typed” version of the monoid methods. This seems to me to be very similar to the indexing stuff we were doing earlier on.

The real interesting part is in the paper “Updated Monads: Cointerpreting Directed Containers” (Ahman and Uustalu 2014). This paper presents a variant on state monads, called “update monads”.

These are monads that use a monoid action:

record RightAction (𝑃 : Type) (𝑆 : Type) : Type where infixl 5 _↓_ field ⦃ monoid⟨𝑃⟩ ⦄ : Monoid 𝑃 _↓_ : 𝑆 → 𝑃 → 𝑆 ↓-assoc : ∀ x y z → (x ↓ y) ↓ z ≡ x ↓ (y ∙ z) ↓-ε : ∀ x → x ↓ ε ≡ x

A (right) monoid action is a monoid along with a function
`↓`

that “acts” on some other set, in a way that coheres with
the monoid methods. The definition is given above. One way to think
about it is that if a monoid `𝑃`

has an action on
`𝑆`

it means that elements of `𝑃`

can kind of be
transformed into elements of `𝑆 → 𝑆`

.

Upd : (𝑃 𝑆 : Type) ⦃ _ : RightAction 𝑃 𝑆 ⦄ → Type → Type Upd 𝑃 𝑆 X = 𝑆 → 𝑃 × X η : ⦃ _ : RightAction 𝑃 𝑆 ⦄ → A → Upd 𝑃 𝑆 A η x s = ε , x μ : ⦃ _ : RightAction 𝑃 𝑆 ⦄ → Upd 𝑃 𝑆 (Upd 𝑃 𝑆 A) → Upd 𝑃 𝑆 A μ xs s = let p , x = xs s q , y = x (s ↓ p) in (p ∙ q , y)

It turns out that the dependently-typed version of this gives directed containers.

I’m still in the early stages of understanding all of this material, but at the moment graded comonads and transformers are concepts that I’m much more familiar and comfortable with.

The idea behind graded monads and comonads is similar to the idea
behind any indexed monad: we’re adding an extra type parameter to the
monad or type, which can constrain the operations involved. The
*graded* monads and comonads use a monoid as that index. This
works particularly nicely, in my opinion: just allowing any index at all
sometimes feels a little unstructured. The grading construction seems to
constrain things to the right degree: the use of the monoid, as well,
works really well with comonads.

That preamble out of the way, here’s the definition of a graded comonad:

record GradedComonad (𝑆 : Type) ⦃ _ : Monoid 𝑆 ⦄ (𝐶 : 𝑆 → Type → Type) : Type₁ where field extract : 𝐶 ε A → A extend : (𝐶 y A → B) → 𝐶 (x ∙ y) A → 𝐶 x BThis also has a few laws, which are expressed cleaner using cokleisli composition:

_=<=_ : (𝐶 x B → C) → (𝐶 y A → B) → 𝐶 (x ∙ y) A → C (g =<= f) x = g (extend f x) field idˡ : (f : 𝐶 x A₀ → B₀) → PathP (λ i → 𝐶 (ε∙ x i) A₀ → B₀) (extract =<= f) f idʳ : (f : 𝐶 x A₀ → B₀) → PathP (λ i → 𝐶 (∙ε x i) A₀ → B₀) (f =<= extract) f c-assoc : (f : 𝐶 x C₀ → D₀) (g : 𝐶 y B₀ → C₀) (h : 𝐶 z A₀ → B₀) → PathP (λ i → 𝐶 (assoc x y z i) A₀ → D₀) ((f =<= g) =<= h) (f =<= (g =<= h))

This seems to clearly be related to the stream constructions. Grading is all about the monoidal information about a comonad: the streams above are a comonad which indexes its entries with a monoid.

There are now two constructions I want to show that suggest a link
betweent the stream constructions and graded comonads. First of these is
the *Cofree degrading comonad*:

record G-CofreeF (𝐹 : Type → Type) (𝐶 : 𝑆 → Type → Type) (A : Type) : Type where coinductive; constructor _◃_ field here : A step : 𝐹 (∃ w × 𝐶 w (G-CofreeF 𝐹 𝐶 A)) open G-CofreeF G-Cofree : ⦃ _ : Monoid 𝑆 ⦄ → (Type → Type) → (𝑆 → Type → Type) → Type → Type G-Cofree 𝐹 𝐶 A = 𝐶 ε (G-CofreeF 𝐹 𝐶 A)

This construction is similar to the cofree comonad transformer: it is
based on the cofree comonad, but with an extra (graded) comonad wrapped
around each level. For any functor 𝐹 and graded comonad 𝐶,
`G-Cofree 𝐹 𝐶`

is a comonad. The implementation of
`extract`

is simple:

extract′ : ⦃ _ : Monoid 𝑆 ⦄ ⦃ _ : GradedComonad 𝑆 𝐶 ⦄ → G-Cofree 𝐹 𝐶 A → A extract′ = here ∘ extract

`extend`

is more complex. First, we need a version of
`extend`

which takes a proof that the grade is of the right
form:

module _ { 𝐶 : 𝑆 → Type → Type } where extend[_] : ⦃ _ : Monoid 𝑆 ⦄ ⦃ _ : GradedComonad 𝑆 𝐶 ⦄ → x ∙ y ≡ z → (𝐶 y A → B) → 𝐶 z A → 𝐶 x B extend[ p ] k = subst (λ z → 𝐶 z _ → _) p (extend k)

Then we can implement the characteristic function on the free
comonad: `traceT`

. On graded comonads it has the following
form:

module Trace ⦃ _ : Monoid 𝑆 ⦄ ⦃ _ : GradedComonad 𝑆 𝐶 ⦄ ⦃ _ : Functor 𝐹 ⦄ where module _ {A B} where {-# NON_TERMINATING #-} traceT : (𝐶 ε A → B) → (𝐶 ε A → 𝐹 (∃ w × 𝐶 w A)) → 𝐶 ε A → G-Cofree 𝐹 𝐶 B traceT ϕ ρ = ψ where ψ : 𝐶 x A → 𝐶 x (G-CofreeF 𝐹 𝐶 B) ψ = extend[ ∙ε _ ] λ x → ϕ x ◃ map (map₂ ψ) (ρ x)

This function is basically the unfold for the free degrading comonad.
If `G-Cofree`

is a internally-labelled tree, then
`ϕ`

above is the labelling function, and `ρ`

is
the “next” function, returning the children for some root.

Using this, we can implement `extend`

:

extend′ : (G-Cofree 𝐹 𝐶 A → B) → G-Cofree 𝐹 𝐶 A → G-Cofree 𝐹 𝐶 B extend′ f = traceT f (step ∘ extract)

The relation between this and the stream is that the stream can be
defined in terms of this:
`Stream W = G-Cofree id (GC-Id W)`

.

Finally, the last construction I want to introduce is the following:

module _ ⦃ _ : Monus 𝑆 ⦄ where data Prefix-F⊙ (𝐹 : Type → Type) (𝐶 : 𝑆 → Type → Type) (i j : 𝑆) (A : Type) : Type where prefix : ((i≤j : i ≤ j) → A × 𝐹 (∃ k × 𝐶 k (Prefix-F⊙ 𝐹 𝐶 k (fst i≤j) A))) → Prefix-F⊙ 𝐹 𝐶 i j A Prefix⊙ : (𝐹 : Type → Type) (𝐶 : 𝑆 → Type → Type) (j : 𝑆) (A : Type) → Type Prefix⊙ 𝐹 𝐶 j A = 𝐶 ε (Prefix-F⊙ 𝐹 𝐶 ε j A) Prefix : (𝐹 : Type → Type) (𝐶 : 𝑆 → Type → Type) (A : Type) → Type Prefix 𝐹 𝐶 A = ∀ {i} → Prefix⊙ 𝐹 𝐶 i A

This type is designed to mimic sized type definitions. It has an
implicit parameter which can be set, by the user of the type, to some
arbitrary depth. Basically the parameter means “explore to this depth”;
by using the `∀`

we say that it is defined up to any
arbitrary depth.

When the `≺`

relation on the monus is well founded it is
possible to implement `traceT`

:

module _ ⦃ _ : GradedComonad 𝑆 𝐶 ⦄ ⦃ _ : Functor 𝐹 ⦄ (wf : WellFounded _≺_) {A B : Type} where traceT : (𝐶 ε A → B) → (𝐶 ε A → 𝐹 (∃ w × (w ≢ ε) × 𝐶 w A)) → 𝐶 ε A → Prefix 𝐹 𝐶 B traceT ϕ ρ xs = extend[ ∙ε _ ] (λ xs′ → prefix λ _ → ϕ xs′ , map (map₂ (ψ (wf _))) (ρ xs)) xs where ψ : Acc _≺_ y → (x ≢ ε) × 𝐶 x A → 𝐶 x (Prefix-F⊙ 𝐹 𝐶 x y B) ψ (acc wf) (x≢ε , xs) = extend[ ∙ε _ ] (λ x → prefix λ { (k , y≡x∙k) → ϕ x , map (λ { (w , w≢ε , xs) → w , ψ (wf k (_ , y≡x∙k ; comm _ _ , x≢ε)) (w≢ε , xs)}) (ρ x)}) xs

Comonads are much less widely used than monads in Haskell and similar languages. Part of the reason, I think, is that they’re too powerful in a non-linear language. Monads are often used to model sublanguages where it’s possible to introduce “special” variables which interact with the monadic context.

```
= do
pyth <- [1..10]
x <- [1..10]
y <- [1..10]
z *x + y*y == z*z)
guard (xreturn (x,y,z)
```

The `x`

variable here semantically spans over the range
`[1..10]`

. In the following two examples we see the semantics
of state and maybe:

```
sum :: [Int] -> Int
sum xs = flip evalState 0 $ do
0
put $ \x -> do
for_ xs <- get
n + x)
put (n <- get
m return m
```

```
data E = Lit Int | E :+: E | E :/: E
eval :: E -> Maybe Int
Lit n) = n
eval (:+: ys) = do x <- eval xs
eval (xs <- eval ys
y return (x + y)
:/: ys) = do x <- eval xs
eval (xs <- eval ys
y /= 0)
guard (y return (x / y)
```

The variables `n`

and `m`

introduced in the
state example are “special” because their values depend on the
computations that came before. In the maybe example the variables
introduced could be `Nothing`

.

You can’t do the same thing with comonads because you’re always able
to extract the “special” variable with
`extract :: m a -> a`

. Instead of having special variable
*introduction*, comonads let you have special variable
*elimination*. But, since Haskell isn’t linear, you can always
just discard a variable so this isn’t much use.

Looking at the maybe example, we have a function
`eval :: E -> Maybe Int`

that introduces an
`Int`

variable with a “catch”: it is wrapped in a
`Maybe`

. We want to use the `eval`

function as if
it were a normal function `E -> Int`

, with all of the
bookkeeping managed for us: that’s what monads and do notation (kind of)
allow us to do.

An analagous example with comonads might be having a function
`consume :: m V -> String`

. This “handles” a
`V`

value, but the “catch” is that it needs an `m`

context to do so. If we want to treat the `consume`

function
as if it were a normal function `V -> String`

then
comonads (and
codo notation Orchard and Mycroft 2013) would be a perfect
fit.

The reason that this analagous case doesn’t arise very often is that
we don’t have many handlers that look like `m V -> String`

in Haskell. Why? Because if we want to “handle” a `V`

we can
just discard it: as a non-linear language, you do not need to perform
any ceremony to discard a variable in Haskell.

Graded comonads, though, seem to be much more useful than normal
comonads. I think it is becuase they basically get rid of the
`m a -> a`

function, changing it into a much more
restricted form. In this way, they give a kind of small linear language,
but just for the monoidal type parameter.

And there are a lot of uses for the graded comonads. Above we’ve used
them for termination checking. A recursive function might have the form
`a -> b`

, where `a`

is the thing being recursed
on. If we’re using well-founded recursion to show that it’s terminating,
though, we add an extra parameter, an `Acc _<_`

proof,
turning this function into `Acc _<_ w × a -> b`

. The
`Acc _<_`

here is the graded comonad, and this recursive
function is precisely the “handler”.

Other examples might be privacy or permissions: a function might be
able to work on some value, but only if it has particular
*permission* regarding that value. The permission here is the
monoid.

There are other examples I’m sure, those are just the couple that I have been thinking about.

Abel, Andreas, and Brigitte Pientka. 2013. “Wellfounded
Recursion with Copatterns” (0) (June):
25. http://www2.tcs.ifi.lmu.de/%7Eabel/icfp13-long.pdf.

Ahman, Danel, James Chapman, and Tarmo Uustalu. 2012. “When
Is a Container a Comonad?”
In *Foundations of Software Science and
Computational Structures*, 74–88. Lecture
Notes in Computer Science. Springer,
Berlin, Heidelberg. doi:10.1007/978-3-642-28729-9_5.

Ahman, Danel, and Tarmo Uustalu. 2013. “Distributive laws of
directed containers.” *Progress in Informatics* (10)
(March): 3. doi:10.2201/NiiPi.2013.10.2.

———. 2014. “Update Monads: Cointerpreting
Directed Containers”: 23 pages. doi:10.4230/LIPICS.TYPES.2013.1.

———. 2016. “Directed Containers as
Categories” (April). doi:10.4204/EPTCS.207.5.

Kidney, Donnacha Oisín, and Nicolas Wu. 2021. “Algebras for
weighted search.” *Proceedings of the ACM on Programming
Languages* 5 (ICFP) (August): 72:1–72:30. doi:10.1145/3473577.

Kmett, Edward. 2018. “The State Comonad.”
Blog. *The Comonad.Reader*. http://comonad.com/reader/2018/the-state-comonad/.

Orchard, Dominic, and Alan Mycroft. 2013. “A Notation
for Comonads.” In *Implementation and
Application of Functional Languages*, ed
by. Ralf Hinze, 1–17. Lecture Notes in Computer
Science. Berlin, Heidelberg: Springer.
doi:10.1007/978-3-642-41582-1_1.

Waern, Love. 2018. “I made a monad that I haven’t
seen before, and I have a few questions about it.”
Reddit {{Post}}. *reddit.com/r/haskell*. https://www.reddit.com/r/haskell/comments/7oav51/i_made_a_monad_that_i_havent_seen_before_and_i/.

Tags: Haskell

I have packaged up the more interesting bits from the Algebras for Weighted Search paper and put it up on hackage.

You can see it here.

It contains the `HeapT`

monad, the `Monus`

class, and an implementation of Dijkstra’s algorithm, the Viterbi
algorithm, and probabilistic parsing.

Check it out!

]]>The paper “Algebras for Weighted Search” has just been accepted unconditionally to ICFP. I wrote it with my supervisor, Nicolas Wu, and it covers a lot of the topics I’ve written about on this blog (including hyperfunctions and breadth-first traversals).

The preprint is available here.

]]>
Tags: Haskell

Check out this type:

`newtype a -&> b = Hyp { invoke :: (b -&> a) -> b } `

This a hyperfunction (J. Launchbury, Krstic, and Sauerwein 2013; 2000; 2000), and I think it’s one of the weirdest and most interesting newtypes you can write in Haskell.

The first thing to notice is that the recursion pattern is weird. For
a type to refer to itself recursively on the *left* of a function
arrow is pretty unusual, but on top of that the recursion is
*non-regular*. That means that the recursive reference has
different type parameters to its parent: `a -&> b`

is
on the left-hand-side of the equals sign, but on the right we refer to
`b -&> a`

.

Being weird is reason enough to write about them, but what’s really
shocking about hyperfunctions is that they’re *useful*. Once I
saw the definition I realised that a bunch of optimisation code I had
written (to fuse away zips in particular) was actually using
hyperfunctions (Ghani et al. 2005). After that,
I saw them all over the place: in coroutine implementations, queues,
breadth-first traversals, etc.

Anyways, since coming across hyperfunctions a few months ago I
thought I’d do a writeup on them. I’m kind of surprised they’re not more
well-known, to be honest: they’re like a slightly more enigmatic `Cont`

monad, with a far cooler name. Let’s get into it!

The newtype noise kind of hides what’s going on with hyperfunctions: expanding the definition out might make things slightly clearer.

```
type a -&> b = (b -&> a) -> b
= ((a -&> b) -> a) -> b
= (((b -&> a) -> b) -> a) -> b
= ((((... -> b) -> a) -> b) -> a) -> b
```

So a value of type `a -&> b`

is kind of an
infinitely left-nested function type. One thing worth noticing is that
all the `a`

s are in negative positions and all the
`b`

s in positive. This negative and positive business
basically refers to the position of arguments in relation to a function
arrow: to the left are negatives, and to the right are positives, but
two negatives cancel out.

```
... -> b) -> a) -> b) -> a) -> b
((((+ - + - +
```

All the things in negative positions are kind of like the things a
function “consumes”, and positive positions are the things “produced”.
It’s worth fiddling around with very nested function types to get a feel
for this notion. For hyperfunctions, though, it’s enough to know that
`a -&> b`

does indeed (kind of) take in a bunch of
`a`

s, and it kind of produces `b`

s.

By the way, one of the ways to get to grips with polarity in this
sense is to play around with the Cont monad, codensity monad, or
selection monad (Hedges 2015). If you do, you
may notice one of the interesting parallels about hyperfunctions: the
type `a -&> a`

is in fact the fixpoint of the
continuation monad (`Fix (Cont a)`

). Suspicious!

Before diving further into the properties of the type itself, I’d like to give some examples of how it can show up in pretty standard optimisation code.

Let’s say you wanted to write `zip`

with
`foldr`

(I have already described this particular algorithm
in a previous
post). Not `foldr`

on the left argument, mind you, but
`foldr`

on *both*. If you proceed mechanically,
replacing every recursive function with `foldr`

, you can
actually arrive at a definition:

```
zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr xf xb xs (foldr yf yb ys)
where
= yk x xk
xf x xk yk = []
xb _
= (x,y) : xk yk
yf y yk x xk = [] yb _ _
```

In an untyped language, or a language with recursive types, such a definition would be totally fine. In Haskell, though, the compiler will complain with the following:

```
• Occurs check: cannot construct the infinite type:
t0 ~ a -> (t0 -> [(a, b)]) -> [(a, b)]
```

Seasoned Haskellers will know, though, that this is not a type error:
no, this is a type *recipe*. The compiler is telling you what
parameters it wants you to stick in the newtype:

```
newtype Zip a b =
Zip { runZip :: a -> (Zip a b -> [(a,b)]) -> [(a,b)] }
zip :: forall a b. [a] -> [b] -> [(a,b)]
zip xs ys = xz yz
where
xz :: Zip a b -> [(a,b)]
= foldr f b xs
xz where
= runZip yk x xk
f x xk yk = []
b _
yz :: Zip a b
= foldr f b ys
yz where
= Zip (\x xk -> (x,y) : xk yk)
f y yk = Zip (\_ _ -> []) b
```

And here we see the elusive hyperfunction: hidden behind a slight
change of parameter order, `Zip a b`

is in fact the same as
`[(a,b)] -&> (a -> [(a,b)])`

.

```
zip :: forall a b. [a] -> [b] -> [(a,b)]
zip xs ys = invoke xz yz
where
xz :: (a -> [(a,b)]) -&> [(a,b)]
= foldr f b xs
xz where
= Hyp (\yk -> invoke yk xk x)
f x xk = Hyp (\_ -> [])
b
yz :: [(a,b)] -&> (a -> [(a,b)])
= foldr f b ys
yz where
= Hyp (\xk x -> (x,y) : invoke xk yk)
f y yk = Hyp (\_ _ -> []) b
```

In another previous post I derived the following function to do a breadth-first traversal of a tree:

```
data Tree a = a :& [Tree a]
newtype Q a = Q { q :: (Q a -> [a]) -> [a] }
bfe :: Tree a -> [a]
= q (f t b) e
bfe t where
f :: Tree a -> Q a -> Q a
:& xs) fw = Q (\bw -> x : q fw (bw . flip (foldr f) xs))
f (x
b :: Q a
= Q (\k -> k b)
b
e :: Q a -> [a]
Q q) = q e e (
```

That `Q`

type there is another hyperfunction.

```
bfe :: Tree a -> [a]
= invoke (f t e) e
bfe t where
f :: Tree a -> ([a] -&> [a]) -> ([a] -&> [a])
:& xs) fw = Hyp (\bw -> x : invoke fw (Hyp (invoke bw . flip (foldr f) xs)))
f (x
e :: [a] -&> [a]
= Hyp (\k -> invoke k e) e
```

One of the problems I had with the above function was that it didn’t terminate: it could enumerate all the elements of the tree but it didn’t know when to stop. A similar program (Allison 2006; described and translated to Haskell in Smith 2009) manages to solve the problem with a counter. Will it shock you to find out this solution can also be encoded with a hyperfunction?

```
= invoke (f t (Hyp b)) e 1
bfe t where
f :: Tree a -> (Int -> [a]) -&> (Int -> [a])
-> (Int -> [a]) -&> (Int -> [a])
:& xs) fw =
f (x Hyp (\bw n -> x : invoke fw (Hyp (\k m -> invoke bw (foldr f k xs) (m+1))) n)
e :: (Int -> [a]) -&> (Int -> [a])
= Hyp (\k -> invoke k e)
e
0 = []
b x = invoke x (Hyp b) (n-1) b x n
```

(my version here is actually a good bit different from the one in Smith 2009, but the basic idea is the same)

Hyperfunctions seem to me to be quite deeply related to coroutines.
At the very least several of the types involved in coroutine
implementations are actual hyperfunctions. The `ProdPar`

and
`ConsPar`

types from Pieters and Schrijvers (2019)
are good examples:

```
newtype ProdPar a b = ProdPar (ConsPar a b -> b)
newtype ConsPar a b = ConsPar (a -> ProdPar a b -> b)
```

`ProdPar a b`

is isomorphic to
`(a -> b) -&> b`

, and `ConsPar a b`

to
`b -&> (a -> b)`

, as witnessed by the following
functions:

`ProdPar`

, `ConsPar`

and hyperfunctions
```
fromP :: ProdPar a b -> (a -> b) -&> b
ProdPar x) = Hyp (x . toC)
fromP (
toC :: b -&> (a -> b) -> ConsPar a b
Hyp h) = ConsPar (\x p -> h (fromP p) x)
toC (
toP :: (a -> b) -&> b -> ProdPar a b
Hyp x) = ProdPar (x . fromC)
toP (
fromC :: ConsPar a b -> b -&> (a -> b)
ConsPar p) = Hyp (\h x -> p x (toP h)) fromC (
```

In fact this reveals a little about what was happening in the
`zip`

function: we convert the left-hand list to a
`ProdPar`

(producer), and the right-hand to a consumer, and
apply them to each other.

Aside from just being kind of weird intuitively, hyperfunctions are
weird *in theory*. Set-theoretically, for instance, you cannot
form the set of `a -&> b`

: if you tried, you’d run
into those pesky size restrictions which stop us from making things like
“the set of all sets”. Haskell types, however, are not sets, precisely
because we can define things like `a -&> b`

.

For slightly different reasons to the set theory restrictions, we can’t define the type of hyperfunctions in Agda. The following will get an error:

```
record _↬_ (A : Type a) (B : Type b) : Type (a ℓ⊔ b) where
inductive; constructor hyp
field invoke : (B ↬ A) → B
```

And for good reason! Agda doesn’t allow recursive types where the recursive call is in a negative position. If we turn off the positivity checker, we can write Curry’s paradox (example proof taken from here):

```
: ⊥ ↬ ⊥
yes? .invoke h = h .invoke h
yes?
: (⊥ ↬ ⊥) → ⊥
no! = h .invoke h
no! h
: ⊥
boom = no! yes? boom
```

Note that this isn’t an issue with the termination checker: the above
example passes all the normal termination conditions without issue (yes,
even if `↬`

is marked as `coinductive`

). It’s
directly because the type itself is not positive.

Interestingly, there is a slightly different, and nearly equivalent, definition of hyperfunctions which doesn’t allow us to write the above proof:

```
record _↬_ (A : Type a) (B : Type b) : Type (a ℓ⊔ b) where
inductive; constructor hyp
field invoke : ((A ↬ B) → A) → B
```

This is basically a slightly expanded out version of the
hyperfunction type, and importantly it’s *positive*. Not
*strictly* positive however, since the recursive call does occur
to the left of a function arrow: it’s just positive, in that it’s to the
left of an even number of function arrows.

I found in a blog post by Sjöberg (2015) some interesting
discussion regarding the question of this extra strictness: in Coq,
allowing certain positive but not *strictly* positive types does
indeed introduce an inconsistency (Coquand and Paulin 1990).
However this inconsistency relies on an impredicative universe, which
Agda doesn’t have. As far as I understand it, it would likely be safe to
allow types like `↬`

above in Agda (Coquand 2013), although I’m not
certain that with all of Agda’s newer features that’s still the
case.

The connection between non-strictly-positive types and breadth-first traversals has been noticed before: Berger, Matthes, and Setzer (2019) make the argument for their inclusion in Agda and Coq using a breadth-first traversal algorithm by Hofmann (1993), which uses the following type:

```
data Rou
= Over
| Next ((Rou -> [Int]) -> [Int])
```

Now this type *isn’t* a hyperfunction (but it’s close); we’ll
see soon what kind of thing it is.

So we’ve seen that hyperfunctions show up kind of incidentally through certain optimisations, and we’ve seen that they occupy a strange space in terms of their theoretical interpretation: we haven’t yet seen much about the type itself in isolation. Luckily Ed Kmett has already written the hyperfunctions package -Kmett (2015), where a laundry list of instances are provided, which can tell us a little more about what hyperfunctions can actually do on their own.

The `Category`

instance gives us the following:

```
instance Category (-&>) where
id = Hyp (\k -> invoke k id)
. g = Hyp (\k -> invoke f (g . k)) f
```

We’ve actually seen the identity function a few times: we used it as the base case for recursion in the breadth-first traversal algorithms.

Composition we actually have used as well but it’s more obscured. An
analogy to help clear things up is to think of hyperfunctions as a kind
of *stack*. `id`

is the empty stack, and we can use
the following function to push items onto the stack:

```
push :: (a -> b) -> a -&> b -> a -&> b
= Hyp (\k -> f (invoke k q)) push f q
```

Understood in this sense, composition acts like a zipping operation on stacks, since we have the following law:

`. push g q ≡ push (f . g) (p . q) push f p `

While we can’t really pop elements off the top of the stack directly,
we can get close with `invoke`

, since it satisfies the
following law:

` invoke (push f p) q ≡ f (invoke q p)`

Along with the `id`

implementation we have, this will let
us run a hyperfunction, basically folding over the contents of the
stack:

```
run :: a -&> a -> a
= invoke f id run f
```

This analogy helps us understand how the breadth-first traversals
worked: the hyperfunctions are kind of like stacks with
$\mathcal{O}(1)$
`push`

and `zip`

, which is precisely what you need
for an efficient breadth-first traversal.

```
bfe :: Tree a -> [a]
= run . f
bfe where
:& xs) = push (x:) (zips (map f xs))
f (x
= foldr (.) id zips
```

Finally, hyperfunctions are of course monads:

```
instance Monad ((-&>) a) where
>>= f = Hyp (\k -> invoke (f (invoke m (Hyp (invoke k . (>>=f))))) k) m
```

I won’t pretend to understand what’s going on here, but it looks a
little like a nested reader monad. Perhaps there’s some intuition to be
gained from noticing that
`a -&> a ~ Fix (Cont a)`

.

As I said in the introduction I’m kind of surprised there’s not more research out there on hyperfunctions. Aside from the excellent papers by J. Launchbury, Krstic, and Sauerwein (2013) there’s just not much out there. Maybe it’s that there’s not that much theoretical depth to them, but all the same there are some clear questions worth looking into.

For example: is there a hyperfunction monad transformer? Or, failing that, can you thread a monad through the type at any point, and do you get anything interesting out?

I have made a little headway on this question, while fiddling with
one of the `bfe`

definitions above. Basically I wanted to
remove the `Int`

counter for the terminating
`bfe`

, and I wanted to use a `Maybe`

somewhere
instead. I ended up generalising from `Maybe`

to any
`m`

, yielding the following type:

`newtype HypM m a b = HypM { invokeM :: m ((HypM m a b -> a) -> b) }`

This does the job for the breadth-first traversal:

```
= r (f t e)
bfe t where
f :: Tree a -> HypM Maybe [a] [a] -> HypM Maybe [a] [a]
:& xs) fw = HypM (Just (\bw -> x : fromMaybe (\k -> k e) (invokeM fw) (bw . flip (foldr f) xs)))
f (x
e :: HypM Maybe [a] [a]
= HypM Nothing
e
r :: HypM Maybe [a] [a] -> [a]
= maybe [] (\k -> k r) . invokeM r
```

(In fact, when `m`

is specialised to `Maybe`

we
have the same type as `Rou`

)

This type has a very practical use, as it happens, which is related to the church-encoded list monad transformer:

`newtype ListT m a = ListT { runListT :: forall b. (a -> m b -> m b) -> m b -> m b }`

Just like `-&>`

allowed us to write
`zip`

on folds (i.e. using `foldr`

),
`HypM`

will allow us to write `zipM`

on
`ListT`

:

```
zipM :: Monad m => ListT m a -> ListT m b -> ListT m (a,b)
= ListT (\c n ->
zipM xs ys let
= pure (\yk -> yk (HypM xk) x)
xf x xk = pure (\_ -> n)
xb
= pure (\xk x -> c (x, y) (join (invokeM xk <*> yk)))
yf y yk = pure (\_ _ -> n)
yb in join (runListT xs xf xb <*> runListT ys yf yb))
```

I actually think this function could be used to seriously improve the
running time of several of the functions on `LogicT`

:
my reading of them suggests that `interleave`

is
$\mathcal{O}(n^2)$
(or worse), but the zip above could be trivially repurposed to give a
$\mathcal{O}(n)$
`interleave`

. This would also have knock-on effects on, for
instance, `>>-`

and so on.

Another question is regarding the arrows of the hyperfunction. We’ve seen that a hyperfunction kind of adds “stacking” to functions, can it do the same for other arrows? Basically, does the following type do anything useful?

`newtype HypP p a b = HypP { invokeP :: p (HypP p b a) b }`

Along a similar vein, many of the breadth-first enumeration
algorithms seem to use “hyperfunctions over the endomorphism monoid”.
Basically, they all produce hyperfunctions of the type
`[a] -&> [a]`

, and use them quite similarly to how we
would use difference lists. But we know that there are Cayley transforms
in other monoidal categories, for instance in the applicative monoidal
category: can we construct the “hyperfunction” version of those?

Allison, Lloyd. 2006. “Circular Programs and
Self-Referential Structures.”
*Software: Practice and Experience* 19 (2) (October): 99–109.
doi:10.1002/spe.4380190202.
http://users.monash.edu/~lloyd/tildeFP/1989SPE/.

Berger, Ulrich, Ralph Matthes, and Anton Setzer. 2019. “Martin
Hofmann’s Case for
Non-Strictly Positive Data Types.” In
*24th international conference on types for proofs and programs
(TYPES 2018)*, ed by. Peter Dybjer, José Espírito
Santo, and Luís Pinto, 130:22. Leibniz international proceedings in
informatics (LIPIcs). Dagstuhl, Germany:
Schloss DagstuhlLeibniz-Zentrum fuer Informatik. doi:10.4230/LIPIcs.TYPES.2018.1.
http://drops.dagstuhl.de/opus/volltexte/2019/11405.

Coquand, Thierry. 2013. “[Agda] defining coinductive
types.” https://lists.chalmers.se/pipermail/agda/2013/006189.html.

Coquand, Thierry, and Christine Paulin. 1990. “Inductively defined
types.” In *COLOG-88*, ed by. Per Martin-Löf
and Grigori Mints, 50–66. Lecture Notes in Computer
Science. Berlin, Heidelberg: Springer.
doi:10.1007/3-540-52335-9_47.

Ghani, Neil, Patricia Johann, Tarmo Uustalu, and Varmo Vene. 2005.
“Monadic augment and generalised short cut fusion.” In
*Proceedings of the tenth ACM SIGPLAN international
conference on Functional programming*, 294–305.
ICFP ’05. New York, NY, USA: Association
for Computing Machinery. doi:10.1145/1086365.1086403.
https://doi.org/10.1145/1086365.1086403.

Hedges, Jules. 2015. “The selection monad as a CPS
transformation.” *arXiv:1503.06061 [cs]* (March). http://arxiv.org/abs/1503.06061.

Hofmann, Martin. 1993. “Non Strictly Positive
Datatypes in System F.” https://www.seas.upenn.edu/~sweirich/types/archive/1993/msg00027.html.

Kmett, Edward. 2015. “Hyperfunctions:
Hyperfunctions.” https://hackage.haskell.org/package/hyperfunctions.

Krstic, Sava, and John Launchbury. 2000. “A Category
of Hyperfunctions.” http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.2421.

Launchbury, J., S. Krstic, and T. E. Sauerwein. 2013. “Coroutining
Folds with Hyperfunctions.”
*Electron. Proc. Theor. Comput. Sci.* 129 (September): 121–135.
doi:10.4204/EPTCS.129.9. http://arxiv.org/abs/1309.5135.

Launchbury, John, Sava Krstic, and Timothy E. Sauerwein. 2000. *Zip
Fusion with Hyperfunctions*. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.4961.

Pieters, Ruben P., and Tom Schrijvers. 2019. “Faster
Coroutine Pipelines: A Reconstruction.”
In *Practical Aspects of Declarative
Languages*, ed by. José Júlio Alferes and Moa Johansson,
133–149. Lecture Notes in Computer Science.
Cham: Springer International Publishing.
doi:10.1007/978-3-030-05998-9_9.
https://people.cs.kuleuven.be/~tom.schrijvers/portfolio/padl2019.html.

Sjöberg, Vilhelm. 2015. “Why must inductive types be strictly
positive?” *Code and stuff*. https://vilhelms.github.io/posts/why-must-inductive-types-be-strictly-positive/.

Smith, Leon P. 2009. “Lloyd Allison’s
Corecursive Queues: Why Continuations
Matter.” *The Monad.Reader* 14 (14) (July): 28. https://meldingmonads.files.wordpress.com/2009/06/corecqueues.pdf.

Tags: Agda

The final version of my master’s thesis got approved recently so I thought I’d post it here for people who might be interested.

And all of the theorems in the thesis have been formalised in Agda. The code is organised to follow the structure of the pdf here.

The title of the thesis is “Finiteness in Cubical Type Theory”: basically it’s all about formalising the notion of “this type is finite” in CuTT. I also wanted to write something that could serve as a kind of introduction to some components of modern dependent type theory which didn’t go the standard length-indexed vector route.

]]>
Tags: Haskell

The Cayley monoid is well-known in Haskell (difference lists, for
instance, are a specific instance of the Cayley monoid), because it
gives us
$O(1)$
`<>`

. What’s less well known is that it’s also
important in dependently typed programming, because it gives us
definitional associativity. In other words, the type
`x . (y . z)`

is definitionally equal to
`(x . y) . z`

in the Cayley monoid.

```
data Nat = Z | S Nat
type family (+) (n :: Nat) (m :: Nat) :: Nat where
Z + m = m
S n + m = S (n + m)
```

I used a form of the type-level Cayley monoid in a previous post to type vector reverse without proofs. I figured out the other day another way to use it to type tree flattening.

Say we have a size-indexed tree and vector:

```
data Tree (a :: Type) (n :: Nat) :: Type where
Leaf :: a -> Tree a (S Z)
(:*:) :: Tree a n -> Tree a m -> Tree a (n + m)
data Vec (a :: Type) (n :: Nat) :: Type where
Nil :: Vec a Z
(:-) :: a -> Vec a n -> Vec a (S n)
```

And we want to flatten it to a list in $O(n)$ time:

```
treeToList :: Tree a n -> Vec a n
= go xs Nil
treeToList xs where
go :: Tree a n -> Vec a m -> Vec a (n + m)
Leaf x) ks = x :- ks
go (:*: ys) ks = go xs (go ys ks) go (xs
```

Haskell would complain specifically that you hadn’t proven the monoid laws:

```
• Couldn't match type ‘n’ with ‘n + 'Z’
• Could not deduce: (n2 + (m1 + m)) ~ ((n2 + m1) + m)
```

But it seems difficult at first to figure out how we can apply the
same trick as we used for vector reverse: there’s no real way for the
`Tree`

type to hold a function from `Nat`

to
`Nat`

.

To solve this problem we can borrow a trick that Haskellers had to use in the good old days before type families to represent type-level functions: types (or more usually classes) with multiple parameters.

```
data Tree' (a :: Type) (n :: Nat) (m :: Nat) :: Type where
Leaf :: a -> Tree' a n (S n)
(:*:) :: Tree' a n2 n3
-> Tree' a n1 n2
-> Tree' a n1 n3
```

The `Tree'`

type here has three parameters: we’re
interested in the last two. The first of these is actually an argument
to a function in disguise; the second is its result. To make it back
into a normal size-indexed tree, we apply that function to zero:

```
type Tree a = Tree' a Z
three :: Tree Int (S (S (S Z)))
= (Leaf 1 :*: Leaf 2) :*: Leaf 3 three
```

This makes the `treeToList`

function typecheck without
complaint:

```
treeToList :: Tree a n -> Vec a n
= go xs Nil
treeToList xs where
go :: Tree' a x y -> Vec a x -> Vec a y
Leaf x) ks = x :- ks
go (:*: ys) ks = go xs (go ys ks) go (xs
```

Consider the following puzzle:

Given a list of $n$ labels, list all the trees with those labels in order.

For instance, given the labels [1,2,3,4], the answer (for binary trees) is the following:

```
┌1 ┌1 ┌1 ┌1 ┌1
┤ ┤ ┌┤ ┌┤ ┌┤
│┌2 │ ┌2 ││┌2 │└2 │└2
└┤ │┌┤ │└┤ ┤ ┌┤
│┌3 ││└3 │ └3 │┌3 │└3
└┤ └┤ ┤ └┤ ┤
└4 └4 └4 └4 └4
```

This problem (the “enumeration” problem) turns out to be quite fascinating and deep, with connections to parsing and monoids. It’s also just a classic algorithmic problem which is fun to try and solve.

The most general version of the algorithm is on forests of rose trees:

```
data Rose a = a :& Forest a
type Forest a = [Rose a]
```

It’s worth having a go at attempting it yourself, but if you’d just like to see the slick solutions the following is one I’m especially proud of:

```
enumForests :: [a] -> [Forest a]
= foldrM f []
enumForests where
= zipWith ((:) . (:&) x) (inits xs) (tails xs) f x xs
```

In the rest of this post I’ll go through the intuition behind solutions like the one above and I’ll try to elucidate some of the connections to other areas of computer science.

I first came across the enumeration problem when I was writing my master’s thesis: I needed to prove (in Agda) that there were finitely many binary trees of a given size, and that I could list them (this proof was part of a larger verified solver for the countdown problem). My first few attempts were unsuccessful: the algorithm presented in the countdown paper (Hutton 2002) was not structurally recursive, and did not seem amenable to Agda-style proofs.

Instead, I looked for a type which was isomorphic to binary trees, and which might be easier to reason about. One such type is Dyck words.

A “Dyck word” is a string of balanced parentheses.

```
()()
(()())()
(())()
```

It’s (apparently) well-known that these strings are isomorphic to
binary trees (although the imperative descriptions of algorithms which
actually computed this isomorphism addled my brain), but what made them
interesting for me was that they are a *flat* type, structured
like a linked list, and as such should be reasonably straightforward to
prove to be finite.

Our first task, then, is to write down a type for Dyck words. Te following is a first possibility:

```
data Paren = LParen | RParen
type Dyck = [Paren]
```

But this type isn’t correct. It includes many values which
*don’t* represent balanced parentheses, i.e. the expressions
`[LParen,RParen] :: Dyck`

are well-typed. To describe dyck
words properly we’ll need to reach for the GADTs:

```
data DyckSuff (n :: Nat) :: Type where
Done :: DyckSuff Z
Open :: DyckSuff (S n) -> DyckSuff n
Clos :: DyckSuff n -> DyckSuff (S n)
type Dyck = DyckSuff Z
```

The first type here represents suffixes of Dyck words; a value of
type `DyckSuff n`

represents a string of parentheses which is
balanced except for `n`

extraneous closing parentheses.
`DyckSuff Z`

, then, has no extraneous closing parens, and as
such is a proper Dyck word.

```
>>> Open $ Clos $ Open $ Clos $ Done :: Dyck
()()
>>> Clos $ Open $ Clos $ Done :: DyckSuff (S Z)
)()
>>> Open $ Open $ Clos $ Open $ Clos $ Clos $ Open $ Clos $ Done :: Dyck
(()())()
>>> Open $ Open $ Clos $ Clos $ Open $ Clos $ Done :: Dyck
(())()
```

The next task is to actually enumerate these words. Here’s an $O(n)$ algorithm which does just that:

```
enumDyck :: Int -> [Dyck]
= go Zy sz Done []
enumDyck sz where
right :: Natty n -> Int -> DyckSuff n -> [Dyck] -> [Dyck]
go, zero, left,
= zero n m k . left n m k . right n m k
go n m k
Zy 0 k = (k:)
zero = id
zero _ _ _
Sy n) m k = go n m (Open k)
left (Zy _ _ = id
left
0 _ = id
right _ = go (Sy n) (m-1) (Clos k)
right n m k
>>> mapM_ print (enumDyck 3)
"()()()"
"(())()"
"()(())"
"(()())"
"((()))"
```

A variant of this function was what I needed in my thesis: I also
needed to prove that it produced every possible value of the type
`Dyck`

, which was not too difficult.

The difficult part is still ahead, though: now we need to convert between this type and a binary tree.

First, for the conversion algorithms we’ll actually need another GADT:

```
infixr 5 :-
data Stack (a :: Type) (n :: Nat) :: Type where
Nil :: Stack a Z
(:-) :: a -> Stack a n -> Stack a (S n)
```

The familiar length-indexed vector will be extremely useful for the next few bits of code: it will act as a stack in our stack-based algorithms. Here’s one of those algorithms now:

```
dyckToTree :: Dyck -> Tree
= go dy (Leaf :- Nil)
dyckToTree dy where
go :: DyckSuff n -> Stack Tree (S n) -> Tree
Open d) ts = go d (Leaf :- ts)
go (Clos d) (t1 :- t2 :- ts) = go d (t2 :*: t1 :- ts)
go (Done (t :- Nil) = t go
```

This might be familiar: it’s actually shift-reduce parsing dressed up with some types. The nice thing about it is that it’s completely total: all pattern-matches are accounted for here, and when written in Agda it’s clearly structurally terminating.

The function in the other direction is similarly simple:

```
treeToDyck :: Tree -> Dyck
= go t Done
treeToDyck t where
go :: Tree -> DyckSuff n -> DyckSuff n
Leaf = id
go :*: ys) = go xs . Open . go ys . Clos go (xs
```

Much of this stuff has been on my mind recently because of this (2020) video on the computerphile channel, in which Graham Hutton goes through using QuickCheck to test an interesting compiler. The compiler itself is explored more in depth in Bahr and Hutton (2015), where the algorithms developed are really quite similar to those that we have here.

The advantage of the code above is that it’s all *total*: we
will never pop items off the stack that aren’t there. This is a nice
addition, and it’s surprisingly simple to add: let’s see if we can add
it to the compiler presented in the paper.

The first thing we need to change is we need to add a payload to our
tree type: the one above is just the *shape* of a binary tree,
but the language presented in the paper contains values.

```
data Expr (a :: Type) where
Val :: a -> Expr a
(:+:) :: Expr a -> Expr a -> Expr a
```

We’ll need to change the definition of `Dyck`

similarly:

```
data Code (n :: Nat) (a :: Type) :: Type where
HALT :: Code (S Z) a
PUSH :: a -> Code (S n) a -> Code n a
ADD :: Code (S n) a -> Code (S (S n)) a
```

After making it so that these data structures can now store contents, there are two other changes worth pointing out:

- The names have been changed, to match those in the paper. It’s a little clearer now that the Dyck word is a bit like code for a simple stack machine.
- The numbering on
`Code`

has changed. Now, the`HALT`

constructor has a parameter of`1`

(well,`S Z`

), where its corresponding constructor in`Dyck`

(`Done`

) had`0`

. Why is this? I am not entirely sure! To get this stuff to all work out nicely took a huge amount of trial and error, I would love to see a more principled reason why the numbering changed here.

With these definitions we can actually transcribe the
`exec`

and `comp`

functions almost verbatim (from page 11 and 12
of 2015).

```
exec :: Code n Int -> Stack Int (n + m) -> Stack Int (S m)
HALT st = st
exec PUSH v is) st = exec is (v :- st)
exec (ADD is) (t1 :- t2 :- st) = exec is (t2 + t1 :- st)
exec (
comp :: Expr a -> Code Z a
= comp' e HALT
comp e where
comp' :: Expr a -> Code (S n) a -> Code n a
Val x) = PUSH x
comp' (:+: ys) = comp' xs . comp' ys . ADD comp' (xs
```

As I have mentioned, a big benefit of all of this stuff is that it
can be translated into Agda readily. The real benefit of *that*
is that we can show the two representations of programs are fully
isomorphic. I have proven this here:
the proof is surprisingly short (about 20 lines), and the rest of the
code follows the Haskell stuff quite closely. I got the idea for much of
the proof from this
bit of code by Callan McGill
(2020).

I’ll include it here as a reference.

```
open import Prelude
open import Data.Nat using (_+_)
open import Data.Vec.Iterated using (Vec; _∷_; []; foldlN; head)
private
variable
: ℕ
n
--------------------------------------------------------------------------------
-- Binary trees: definition and associated functions
--------------------------------------------------------------------------------
data Tree (A : Type a) : Type a where
_] : A → Tree A
[_*_ : Tree A → Tree A → Tree A
--------------------------------------------------------------------------------
-- Programs: definition and associated functions
--------------------------------------------------------------------------------
data Prog (A : Type a) : ℕ → Type a where
: Prog A 1
halt : A → Prog A (1 + n) → Prog A n
push : Prog A (1 + n) → Prog A (2 + n)
pull
--------------------------------------------------------------------------------
-- Conversion from a Prog to a Tree
--------------------------------------------------------------------------------
: Prog A n → Vec (Tree A) n → Tree A
prog→tree⊙ (v ∷ []) = v
prog→tree⊙ halt (push v is) st = prog→tree⊙ is ([ v ] ∷ st)
prog→tree⊙ (pull is) (t₁ ∷ t₂ ∷ st) = prog→tree⊙ is (t₂ * t₁ ∷ st)
prog→tree⊙
: Prog A zero → Tree A
prog→tree = prog→tree⊙ ds []
prog→tree ds
--------------------------------------------------------------------------------
-- Conversion from a Tree to a Prog
--------------------------------------------------------------------------------
: Tree A → Prog A (suc n) → Prog A n
tree→prog⊙ = push x
tree→prog⊙ [ x ] (xs * ys) = tree→prog⊙ xs ∘ tree→prog⊙ ys ∘ pull
tree→prog⊙
: Tree A → Prog A zero
tree→prog = tree→prog⊙ tr halt
tree→prog tr
--------------------------------------------------------------------------------
-- Proof of isomorphism
--------------------------------------------------------------------------------
: (e : Tree A) (is : Prog A (1 + n)) (st : Vec (Tree A) n) →
tree→prog→tree⊙ (tree→prog⊙ e is) st ≡ prog→tree⊙ is (e ∷ st)
prog→tree⊙ = refl
tree→prog→tree⊙ [ x ] is st (xs * ys) is st = tree→prog→tree⊙ xs _ st ;
tree→prog→tree⊙ (pull is) (xs ∷ st)
tree→prog→tree⊙ ys
: (e : Tree A) → prog→tree (tree→prog e) ≡ e
tree→prog→tree = tree→prog→tree⊙ e halt []
tree→prog→tree e
: (is : Prog A n) (st : Vec (Tree A) n) →
prog→tree→prog⊙ (prog→tree⊙ is st) ≡ foldlN (Prog A) tree→prog⊙ is st
tree→prog = refl
prog→tree→prog⊙ halt st (push i is) st = prog→tree→prog⊙ is ([ i ] ∷ st)
prog→tree→prog⊙ (pull is) (t₁ ∷ t₂ ∷ ts) = prog→tree→prog⊙ is ((t₂ * t₁) ∷ ts)
prog→tree→prog⊙
: (is : Prog A 0) → tree→prog (prog→tree is) ≡ is
prog→tree→prog = prog→tree→prog⊙ is []
prog→tree→prog is
: Prog A zero ⇔ Tree A
prog-iso .fun = prog→tree
prog-iso .inv = tree→prog
prog-iso .rightInv = tree→prog→tree
prog-iso .leftInv = prog→tree→prog prog-iso
```

Another thing I’ll mention is that all of the `exec`

functions presented are *folds*. In particular, they’re
*left* folds. Here’s how we’d rewrite `exec`

to make
that fact clear:

```
foldlCode :: (∀ n. a -> b n -> b (S n))
-> (∀ n. b (S (S n)) -> b (S n))
-> b m
-> Code m a -> b (S Z)
HALT = h
foldlCode _ _ h PUSH x xs) = foldlCode p a (p x h) xs
foldlCode p a h (ADD xs) = foldlCode p a (a h) xs
foldlCode p a h (
shift :: Int -> Stack Int n -> Stack Int (S n)
= x :- xs
shift x xs
reduce :: Stack Int (S (S n)) -> Stack Int (S n)
:- t2 :- st) = t2 + t1 :- st
reduce (t1
execFold :: Code Z Int -> Int
= pop . foldlCode shift reduce Nil execFold
```

I think the “foldl-from-foldr” trick could be a nice way to explain the introduction of continuations in Bahr and Hutton (2015).

It turns out that you can follow relatively straightforward rewriting steps from the Dyck-based enumeration algorithm to get to one which avoids Dyck words entirely:

```
enumTrees :: [a] -> [Expr a]
= fmap (foldl1 (flip (:+:))) . foldlM f []
enumTrees where
= [[Val v]]
f [] v = [[Val v, t1]]
f [t1] v :t2:st) v = (Val v : t1 : t2 : st) : f ((t2 :+: t1) : st) v f (t1
```

Maybe in a future post I’ll go through the derivation of this algorithm.

It turns out that the Dyck-based enumeration can be applied without much difficulty to rose trees as well:

```
data Rose a = a :& Forest a
type Forest a = [Rose a]
dyckToForest :: Dyck -> Forest ()
= go dy ([] :- Nil)
dyckToForest dy where
go :: DyckSuff n -> Stack (Forest ()) (S n) -> Forest ()
Open d) ts = go d ([] :- ts)
go (Clos d) (t1 :- t2 :- ts) = go d ((() :& t2 : t1) :- ts)
go (Done (t :- Nil) = t
go
forestToDyck :: Forest () -> Dyck
= go t Done
forestToDyck t where
go :: Forest () -> DyckSuff n -> DyckSuff n
= id
go [] :& x):xs) = go x . Open . go xs . Clos go ((()
```

And again, following relatively mechanical derivations, we arrive at an elegant algorithm:

```
enumForests :: [a] -> [Forest a]
= foldrM f []
enumForests where
= zipWith ((:) . (:&) x) (inits xs) (tails xs) f x xs
```

While researching this post I found that enumeration of trees has
been studied *extensively* elsewhere: see Knuth (2006), for example, or the
excellent blog post by Tychonievich (2013),
or the entire field of Boltzmann
sampling. This post has only scratched the surface of all of that: I
hope to write much more on the topic in the future.

As I mentioned, the Agda code for this stuff can be found here, I have also put all of the Haskell code in one place here.

Bahr, Patrick, and Graham Hutton. 2015. “Calculating correct
compilers.” *Journal of Functional Programming* 25 (e14)
(September). doi:10.1017/S0956796815000180.
https://nottingham-repository.worktribe.com/output/761112.

Hutton, Graham. 2002. “The Countdown Problem.”
*J. Funct. Program.* 12 (6) (November): 609–616. doi:10.1017/S0956796801004300.
http://www.cs.nott.ac.uk/~pszgmh/countdown.pdf.

Knuth, Donald E. 2006. *The Art of Computer
Programming, Volume 4, Fascicle 4:
Generating All Trees–History of
Combinatorial Generation (Art of
Computer Programming)*. Addison-Wesley
Professional. http://www.cs.utsa.edu/~wagner/knuth/fasc4a.pdf.

McGill, Callan. 2020. “Compiler correctness for addition
language.” https://gist.github.com/Boarders/9d83f9cbcfaffb04cf2464588fc46df9.

Riley, Sean. 2020. “Program Correctness -
Computerphile.” University of
Nottingham. https://www.youtube.com/watch?v=T_IINWzQhow.

Tychonievich, Luther. 2013. “Enumerating
Trees.” *Luther’s Meanderings*. https://www.cs.virginia.edu/~lat7h/blog/posts/434.html.

Part 10 of a 10-part series on Breadth-First Traversals

Tags: Haskell

We pick up the story again at the question of a breadth-first (Applicative) traversal of a rose tree (Gibbons 2015). In the last post, I finally came up with an implementation I was happy with:

```
data Tree a = a :& [Tree a]
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
:& xs) = liftA2 (:&) (f x) (bftF f xs)
bft f (x
bftF :: Applicative f => (a -> f b) -> [Tree a] -> f [Tree b]
= fmap head . foldr (<*>) (pure []) . foldr f [pure ([]:)]
bftF t where
:& xs) (q : qs) = liftA2 c (t x) q : foldr f (p qs) xs
f (x
= [pure ([]:)]
p [] :xs) = fmap (([]:).) x : xs
p (x
: ks) = ((x :& xs) : y) : ys
c x k (xs where (y : ys) = k ks
```

It has the correct semantics and asymptotics.

```
=
tree 1 :&
2 :&
[ 5 :&
[ 9 :& []
[ 10 :& []]
, 6 :& []]
, 3 :& []
, 4 :&
, 7 :&
[ 11 :& []
[ 12 :& []]
, 8 :& []]]
,
>>> bft print tree
1
2
3
4
5
6
7
8
9
10
11
12
:&
() :&
[ () :&
[ () :& []
[ () :& []]
, () :& []]
, () :& []
, () :&
, () :&
[ () :& []
[ () :& []]
, () :& []]] , ()
```

But it’s quite difficult to understand, and doesn’t lend much insight into what’s going on with the whole “breadth-first” notion. The technique the function uses also isn’t reusable.

A much nicer function uses the `Phases`

Applicative (Easterly
2019):

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
:& xs) = liftA2 (:&) (Lift (f x)) (later (traverse go xs)) go (x
```

But this function is quadratic.

So the task for this post today is to derive a type like the
`Phases`

type with a `later`

operation, but which
has the appropriate performance characteristics. At the end I’ll look
into what the theoretical properties of this type are.

At its core, the `Phases`

type is basically a free
Applicative (Capriotti and Kaposi 2014).
I’ll reimplement it here as a slightly different free Applicative (one
that’s based on `liftA2`

rather than
`<*>`

):

```
data Free f a where
Pure :: a -> Free f a
Lift :: (a -> b -> c) -> f a -> Free f b -> Free f c
lower :: Applicative f => Free f a -> f a
Pure x) = pure x
lower (Lift f x xs) = liftA2 f x (lower xs) lower (
```

The key with the `Phases`

type is to observe that there’s
actually two possible implementations of `Applicative`

for
the `Free`

type above: one which makes it the “correct” free
applicative:

```
instance Applicative (Free f) where
pure = Pure
Pure x) ys = fmap (c x) ys
liftA2 c (Lift f x xs) ys = Lift (\x (y,z) -> c (f x y) z) x (liftA2 (,) xs ys) liftA2 c (
```

And then one which *zips* effects together:

```
instance Applicative f => Applicative (Free f) where
pure = Pure
Pure x) ys = fmap (c x) ys
liftA2 c (Pure y) = fmap (flip c y) xs
liftA2 c xs (Lift f x xs) (Lift g y ys) =
liftA2 c (Lift
-> c (f x xs) (g y ys))
(\(x,y) (xs,ys)
(liftA2 (,) x y) (liftA2 (,) xs ys)
```

This second instance makes the `Free`

type into not a free
Applicative at all: instead it’s some kind of Applicative transformer
which we can use to reorder effects. Since effects are combined only
when they’re at the same point in the list, we can use it to do our
breadth-first traversal.

As an aside, from this perspective it’s clear that this is some kind
of `FunList`

(van Laarhoven 2009):
this opens up a lot of interesting curiosities about the type, since
that type in particular is quite well-studied.

Anyway, we’re able to do the `later`

operation quite
simply:

```
later :: Free f a -> Free f a
= Lift (const id) (pure ()) later
```

The problem at the moment is that the Applicative instance has an
$\mathcal{O}(n)$
`liftA2`

implementation: this translates into an
$\mathcal{O}(n^2)$
traversal overall.

If we were working in a more simple context of just enumerating the contents of the tree, we might at this point look to something like difference lists: these use the cayley transform on the list monoid to turn the append operation from $\mathcal{O}(n)$ to $\mathcal{O}(n^2)$. It turns out that there is a similar cayley transformation for Applicative functors (Rivas and Jaskelioff 2014; Rivas, Jaskelioff, and Schrijvers 2015):

```
newtype Day f a = Day { runDay :: ∀ b. f b -> f (a, b) }
instance Functor f => Functor (Day f) where
fmap f xs = Day (fmap (first f) . runDay xs)
instance Functor f => Applicative (Day f) where
pure x = Day (fmap ((,) x))
=
liftA2 c xs ys Day (fmap (\(x,(y,z)) -> (c x y, z)) . runDay xs . runDay ys)
```

And with this type we can implement our queue of applicative effects:

```
type Queue f = Day (Free f)
runQueue :: Applicative f => Queue f a -> f a
= fmap fst . lower . flip runDay (Pure ())
runQueue
now :: Applicative f => f a -> Queue f a
= Day \case
now xs Pure x -> Lift (,) xs (Pure x)
Lift f y ys -> Lift (\(x,y) z -> (x, f y z)) (liftA2 (,) xs y) ys
later :: Applicative f => Queue f a -> Queue f a
= Day \case
later xs Pure x -> Lift (const id) (pure ()) (runDay xs (Pure x))
Lift f y ys -> Lift (\x (y,z) -> (y, f x z)) y (runDay xs ys)
```

As expected, this gives us the clean implementation of a breadth-first traversal with the right asymptotics (I think):

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runQueue . go
bft f where
:& xs) = liftA2 (:&) (now (f x)) (later (traverse go xs)) go (x
```

(it’s worth pointing out that we haven’t actually used the applicative instance on the free applicative at any point: we have inlined all of the “zipping” to make it absolutely clear that everything has stayed linear).

I have yet to really dive deep on any of the theory involved in this type, I just quickly wrote up this post when I realised I was able to use the cayley transform from the mentioned papers to implement the proper breadth-first traversal. It certainly seems worth looking at more!

Capriotti, Paolo, and Ambrus Kaposi. 2014. “Free Applicative
Functors.” *Electronic Proceedings in Theoretical
Computer Science* 153 (June): 2–30. doi:10.4204/EPTCS.153.2. http://www.paolocapriotti.com/assets/applicative.pdf.

Easterly, Noah. 2019. “Functions and newtype wrappers for
traversing Trees: Rampion/tree-traversals.” https://github.com/rampion/tree-traversals.

Gibbons, Jeremy. 2015. “Breadth-First
Traversal.” *Patterns in Functional Programming*.
https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Rivas, Exequiel, and Mauro Jaskelioff. 2014. “Notions of
Computation as Monoids.”
*arXiv:1406.4823 [cs, math]* (May). http://arxiv.org/abs/1406.4823.

Rivas, Exequiel, Mauro Jaskelioff, and Tom Schrijvers. 2015. “From
monoids to near-semirings: The essence of MonadPlus and
Alternative.” In *Proceedings of the 17th
International Symposium on Principles and
Practice of Declarative Programming*,
196–207. ACM. doi:10.1145/2790449.2790514.
http://www.fceia.unr.edu.ar/~mauro/pubs/FromMonoidstoNearsemirings.pdf.

van Laarhoven, Twan. 2009. “A non-regular data type
challenge.” *Twan van Laarhoven’s Blog*. https://twanvl.nl/blog/haskell/non-regular1.

Tags: Agda

The best approach to this now is probably to use this action, specifically set up for Agda:

I’ll leave the rest of this post here, but bear in mind the advice is outdated.

Recently travis-ci.org announced that they were closing down, and moving to travis-ci.com. For people who use the service, this basically means that the free component is going away, and you’ll have to pay in the future.

As a result, a lot of people are looking to move to another ci service, so I thought I’d put this short guide together on how to use GitHub actions to typecheck an Agda project and host the rendered code through GitHub pages. The system I have is quite fast: for a quite large project it takes about a minute from pushing for the action to complete.

If you just want to use the same script as me, you can see it here: the rest of this post will just be going through that script and explaining it.

First things first: in order to make an action, you need to put a
YAML file in the `.github/workflows`

directory of your
repository. You can have the following lines at the start:

```
name: Compile Agda and Deploy HTML
on:
push:
branches:
- master
```

This gives a name for the action (which will show up in the actions
tab online for the repo), and says that the action should be run
whenever there’s a push to the branch named `master`

.

We then list the “jobs” the actions does: just one for this action,
called `build`

:

```
jobs:
build:
```

GitHub actions run on GitHub’s servers, the specifications of which can be seen here. For this action we won’t need anything special, so we’ll just use the following:

` runs-on: ubuntu-18.04`

Next we will have the matrix:

```
strategy:
matrix:
cubical-ref: ["v0.2"]
agda-ref: ["v2.6.1.1"]
ghc-ver: ["8.10.2"]
cabal-ver: ["3.4.0.0"]
```

I’m using this matrix as a crude system for environment variables; if this was a CI for some software I wanted to deploy, you could include multiple values for each variable here, to check that the whole thing runs properly with each.

We’re now onto the “steps” portion of the script, where we write small bash-esque script to be run. As such we have the line:

` steps:`

The first step is to cache all the cabal packages we’re going to install. Agda takes about 45 minutes to install so this step is crucial:

```
- uses: actions/cache@v2
name: Cache cabal packages
id: cache-cabal
with:
path: |
~/.cabal/packages
~/.cabal/store
~/.cabal/bin
dist-newstyle key: ${{ runner.os }}-${{ matrix.ghc-ver }}-${{ matrix.cabal-ver }}-${{ matrix.agda-ref }}
```

The `path`

field tells the action which folders to cache,
the `key`

field tells it what key to store them under.

To install Agda we first need to install cabal:

```
- name: Install cabal
if: steps.cache-cabal.outputs.cache-hit != 'true'
uses: actions/setup-haskell@v1.1.3
with:
ghc-version: ${{ matrix.ghc-ver }}
cabal-version: ${{ matrix.cabal-ver }}
```

The `if`

field here allows us to skip this step if we had
a cache hit previously (i.e. if Agda is already installed).

Next we need to ensure that all of the programs installed by cabal are in the path:

```
- name: Put cabal programs in PATH
run: echo "~/.cabal/bin" >> $GITHUB_PATH
```

And then we download and install Agda (along with some dependencies that aren’t installed automatically):

```
- name: Download Agda from github
if: steps.cache-cabal.outputs.cache-hit != 'true'
uses: actions/checkout@v2
with:
repository: agda/agda
path: agda
ref: ${{ matrix.agda-ref }}
- name: Install Agda
if: steps.cache-cabal.outputs.cache-hit != 'true'
run: |
cabal update
cabal install --overwrite-policy=always --ghc-options='-O2 +RTS -M6G -RTS' alex-3.2.5
cabal install --overwrite-policy=always --ghc-options='-O2 +RTS -M6G -RTS' happy-1.19.12
cd agda
mkdir -p doc
touch doc/user-manual.pdf cabal install --overwrite-policy=always --ghc-options='-O1 +RTS -M6G -RTS'
```

The strange flags to `cabal install`

here are
*probably* necessary: I was running out of memory when I tried to
install Agda without them. This might be fixed in future versions of
Agda.

We next need to install any Agda libraries your code depends on. For instance, in my project, I use the cubical library: since Agda doesn’t have a package manager, we basically have to handle all the versioning and so on manually. Also, in order to speed up the build we have to cache the typecheck files for the library.

```
- name: Checkout cubical library
uses: actions/checkout@v2
with:
repository: agda/cubical
path: cubical
ref: ${{ matrix.cubical-ref }}
- name: Cache cubical library
uses: actions/cache@v2
id: cache-cubical
with:
path: ~/cubical-build
key: ${{ runner.os }}-${{ matrix.agda-ver }}-${{ matrix.cubical-ref }}
```

So the library is accessible as an import we need to put it in the Agda library list:

```
- name: Put cubical library in Agda library list
run: |
mkdir -p ~/.agda/
touch ~/.agda/libraries echo "$GITHUB_WORKSPACE/cubical/cubical.agda-lib" > ~/.agda/libraries
```

We then need to typecheck the library: this bit is a little tricky, since not all files in the cubical library actually typecheck.

```
- name: Compile cubical library
if: steps.cache-cubical.outputs.cache-hit != 'true'
run: |
cd $GITHUB_WORKSPACE/cubical
agda Cubical/Core/Everything.agda
agda Cubical/Foundations/Everything.agda
find Cubical/Data -type f -name "*.agda" | while read -r code ; do
agda $code
done
find Cubical/HITs -type f -name "*.agda" | while read -r code ; do
agda $code
done cp -f -r _build/ ~/cubical-build
```

Finally, if the cubical library was already typechecked then we don’t need to do any of that, and we instead just retrieve it from the cache:

```
- name: Retrieve cubical library
if: steps.cache-cubical.outputs.cache-hit == 'true'
run: |
mkdir -p cubical/_build cp -f -r ~/cubical-build/* cubical/_build
```

Finally we have to typecheck the library itself. We want to cache the
output from this step as well, but importantly we want to support
incremental recompilation: i.e. if we only make a small change in one
file we don’t want to have to typecheck every other. We can do this with
`restore-keys`

in the cache:

```
- name: Checkout main
uses: actions/checkout@v2
with:
path: main
- uses: actions/cache@v2
name: Cache main library
id: cache-main
with:
path: ~/main-build
key: html-and-tex-${{ runner.os }}-${{ matrix.agda-ver }}-${{ matrix.cubical-ref }}-${{ hashFiles('main/**') }}
restore-keys: |
html-and-tex-${{ runner.os }}-${{ matrix.agda-ver }}-${{ matrix.cubical-ref }}-
html-and-tex-${{ runner.os }}-${{ matrix.agda-ver }}-
- name: Retrieve main library
if: steps.cache-main.outputs.cache-hit == 'true'
run: cp -f -R ~/main-build/* $GITHUB_WORKSPACE/main
```

Finally, we need to make an “Everything” file: this is an Agda module which contains an import for every module in the project. Typechecking this file is faster than typechecking each file individually.

```
- name: Compile main library
if: steps.cache-main.outputs.cache-hit != 'true'
run: |
mkdir -p ~/main-build/_build
cp -f -R ~/main-build/_build $GITHUB_WORKSPACE/main/_build
rm -r ~/main-build
cd main
find . -type f \( -name "*.agda" -o -name "*.lagda" \) > FileList
sort -o FileList FileList
echo "{-# OPTIONS --cubical #-}" > Everything.agda
echo "" >> Everything.agda
echo "module Everything where" >> Everything.agda
echo "" >> Everything.agda
echo "-- This file imports every module in the project. Click on" >> Everything.agda
echo "-- a module name to go to its source." >> Everything.agda
echo "" >> Everything.agda
cat FileList | cut -c 3- \
| cut -f1 -d'.' \
| sed 's/\//\./g' \
| sed 's/^/open import /' \
>> Everything.agda
rm FileList
agda --html --html-dir=docs Everything.agda
rm Everything.agda
cd .. cp -f -R main/ ~/main-build/
```

And then we need to deploy the generated `html`

so we can
see the rendered library.

```
- name: Deploy html to github pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: main/docs
```

This last step will need you to turn on the github pages setting in
your repository, and have it serve from the `gh-pages`

branch.

Hopefully this script will be useful to some other people! The first time it runs it should take between 30 minutes and an hour; subsequently it takes about a minute for me.

]]>
Tags: Combinators

There are a bunch of “minimal” computational models out there: Turing machines, lambda calculus, PowerPoint (Wildenhain 2017), etc. These are radically simple languages which are nonetheless Turing complete, so theoretically “as powerful” as each other. Of those, lambda calculus is without question my favourite to actually write programs in: it’s the one which is closest to crawling out of the Turing tarpit.

In terms of implementation, though, it is *far* from simple.
Lambda calculus has *variables*, which introduce huge complexity
into the interpreter: especially if you want to do any kind of formal
reasoning about programs, this complexity is a problem. We might want to
reach for something even lower-level than lambda calculus: this is where
combinator calculi come in.

You may have heard of SKI combinator calculus: it’s the “simplest” of
the calculi, but it’s not actually very easy to understand, and it’s
absolute murder to try use. So we’re going to start with
`BCKW`

, a more obscure calculus, actually invented by Haskell
Curry.

There are 4 combinators in `BCKW`

: `B`

,
`C`

, `K`

, and `W`

(shocking, I know).
You can think about these combinators as functions which manipulate the
beginning of strings:

```
Bxyz ~> x(yz)
Cxyz ~> xzy
Kxy ~> x
Wxy ~> xyy
```

Upper case letters are combinators, lower-case are variables. Yes, yes, I know I said that combinator calculi didn’t need variables, and it doesn’t! I’m just using them here to explain how each of the combinators work. If you really want to be pedantic you can think of the lower case letters as notational placeholders meaning “any given combinator”. They won’t exist in any actual programs we write.

Let’s work with some examples to get a sense for how these combinators work.

The simplest combinator is `K`

: it’s actually equivalent
to the `const`

function from Haskell. It discards its second
argument, and returns the first. If you give a combinator more arguments
than it usually accepts, you just keep the extra arguments in the
output:

`Kxyz ~> xz`

`W`

is the next combinator: it *duplicates* its
second argument.

`Wxy ~> xyy`

We always start from the *left*, applying the rule for the
left-most combinator first.

```
WKxyz ~> Kxxyz ~> xyz
KWxyz ~> Wyz ~> yzz
```

Next we have `C`

: this is equivalent to the Haskell
function `flip`

. It swaps the second and third arguments:

`Cxyz ~> xzy`

Here’s a small little evaluator for expressions which use
`C`

, `K`

, and `W`

. You can edit the
expression, and press enter to step through it.

The last combinator introduces parentheses, and it’s equivalent to function composition.

`Bxyz ~> x(yz)`

You can write parentheses yourself: implicitly, all expressions are left-associative. That means that the following are all equal:

`xyz = (xy)z = (x)yz = ((x)y)z`

But `xyz`

is *not* equal to, say,
`x(yz)`

.

And here’s a puzzle to start flexing your combinator skills: one of
the combinators in SKI combinator calculus is `I`

, which is
the identity function.

`Ix ~> x`

Try write an expression which functions the same way as
`I`

, using only the `BCKW`

combinators. Use the
following evaluator to try and figure out how to do it: write an
expression after `λ>`

which functions the same as
`I`

.

`CK`

followed by any combinator will do the trick. So
`CKB`

, `CKK`

, `CKC`

, etc.

`I = CKC`

Update 19/10/2020: A few people have pointed out (Joachim Breitner was the
first) that there is a shorter solution to this problem:
`WK`

. I tend to prefer solutions that don’t include
`W`

, since then we’re working in a subset of the language
that is both terminating and affine; although in this case the reason I
didn’t mention `WK`

is that I just didn’t find it myself.
Each of the combinators we’ve defined so far work a little weird: they seem to skip over their first argument, and work on their second. Indeed, there is another, equivalent combinator calculus which doesn’t have this peculiarity:

```
Bxyz ~> x(yz)
Axy ~> y
Mx ~> xx
Txy ~> yx
```

`B`

stays the same in this calculus, but the rest of the
combinators get switched out for seemingly simpler versions.
`K`

goes to `A`

^{1}:

```
Axy ~> y
Kxy ~> x
```

Which isn’t a huge change. It’s the other two where we see the real
difference. `W`

has been swapped out for `M`

:

```
Wxy ~> xyy
Mx ~> xx
```

As you can see `W`

basically does the same thing as
`M`

, but while passing through its first argument. The
difference between `T`

and `C`

is similar:

```
Cxyz ~> xzy
Txy ~> yx
```

So, first of all, it is pretty simple to show that `BCKW`

contains all of the `BAMT`

combinators. Try find a way to
write `T`

using only `BCKW`

combinators (hint: you
might want to use your previous answer for writing `I`

using
`BCKW`

).

So in fact all of the changed `BAMT`

combinators can be
encoded using `BCKW`

by putting `I`

(or
`CKC`

or what have you) after the corresponding
`BCKW`

combinator. In other words:

```
T = CI = C(CKC)
A = KI = K(CKC)
M = WI = W(CKC)
```

It’s pretty easy to go from `BCKW`

to `BAMT`

,
then. However, it’s *extremely* difficult to go the other way.
Here, try to write `K`

in terms of `BAMT`

(this is
quite difficult, do not expect to get it!):

Either of the following would work:

```
B(TA)(BBT)
B(B(TA)B)T
```

So this is why we will stick to `BCKW`

for the time being:
`BAMT`

is just too painful to use.

One of the things `BCKW`

has over `SKI`

is that
each combinator represents a concrete capability. `K`

and
`W`

especially: without these combinators, we can neither
duplicate nor discard variables. This makes the languages without one or
both of these interesting (albeit not Turing-complete).

If we say that we can’t use `W`

, we know that the will not
duplicate any input. In fact, encoded appropriately, we know that the
program can only decrease its size through execution. The
`BCK`

system is in fact an encoding of *affine* logic,
which is all the rage nowadays. Rust uses affine types to guarantee
memory safety: by preventing duplication of references, you can know
that whenever you’re looking at a variable you’re free to modify it, or
destroy it if necessary (obviously Rust is a bit more complex than what
I’ve described here, but `BCK`

is indeed the fundamental
basis for the system in the same way that `SK`

can be the
basis for any programming language).

If we remove `K`

as well we have a *linear*
language. This is even more restrictive, but is also quite actively
researched at the moment: linear types have been used to construct
languages for differential privacy, for instance.

There’s one small issue with `BC`

: it doesn’t (strictly
speaking) have an equivalent to `I`

. You can write an
expression which is *close*, but it will only actually compute
when applied to at least 3 arguments. See if you can find it.

`BCC`

Usually we add `I`

, though, to give us
`BCI`

.

`S`

is the only combinator we haven’t seen yet. It’s kind
of a combination of `B`

, `C`

, and
`W`

:

`Sxyz ~> xz(yz)`

It does parenthesising, reordering, *and* duplication. This
allows it to be powerful enough to be Turing complete only with the
addition of `K`

. Try first to construct `I`

given
only `S`

and `K`

:

`SK`

followed by any combinator will suffice.

`I = SKK = SKS`

And now construct `S`

from `BCKW`

:

`S = B(BW)(BBC) = B(B(BW)C)(BB)`

Of course, to show that `SK`

is universal we’d need to
show that it contains one of the other universal systems. We won’t do
that exhaustively here, but first just try to figure out `B`

and `W`

:

`B = S(KS)K`

`W = SS(SK) = SS(KI)`

The next task is to encode the `Y`

combinator. This is a
combinator that evaluates to the following:

`Yf ~> f(Yf)`

As you can see, it encodes *recursion*. Like the
`fix`

function in Haskell, this combinator allows us to do
recursion without explicit self-reference. And, of course, we can define
this combinator using the combinators we’ve seen before, since our
language is Turing complete. One encoding is `BM(CBM)`

:

As you can see, `BM(CBM)`

, when applied to `f`

,
yields `f(M(CBMf))`

, which is equivalent to
`f(BM(CBM)f)`

(the `B`

just hasn’t been applied
inside the `f`

). So this is indeed a proper recursion
combinator.

Let’s try doing a little bit of programming with these combinators now.

In the lambada calculus, to encode numbers we often use the
*church* numerals: that’s what we’re going to do here, too. A
church numeral representing some number
$n$
is a function which takes two arguments, and applies the first argument
to the second
$n$
times. Here are some church numerals in Haskell:

```
zero :: (a -> a) -> a -> a
zero f x = x
one :: (a -> a) -> a -> a
one f x = f x
two :: (a -> a) -> a -> a
two f x = f (f x)
three :: (a -> a) -> a -> a
three f x = f (f (f x))
```

Encoding these numerals in combinators is a little more difficult.
Zero and one are obvious: they are `A`

and `I`

,
respectively. Try to figure out two and three:

`WB`

`SB(WB)`

It turns out that it’s pretty easy to encode numbers in a relatively
small amount of space, using a binary encoding. First, multiplication on
Church numerals is simply composition: so that’s `B`

on our
combinators. We already have 2 defined, so the next thing we need for a
binary encoding is a successor function. And we know what *that*
is, from the answer to 3!

This means we can encode normal number in $\mathcal{O}(\log n)$ space (although it still takes linear time to evaluate). The following repl allows for numbers:

We could take up even less space if we allowed for non-normal forms. 4, for instance, could be encoded like so:

`M(WB)`

But we generally prefer to keep our encodings in normal form: otherwise there’s some extra evaluation we have to pay for when we go to use them.

Once upon a time SKI combinators were used as a target for functional
compilers: Miranda, Haskell’s precursor, compiled down to a set of
combinators which included `SKI`

. Nowadays, Haskell is
compiled to the “spineless tagless G-machine”: its compilation technique
took over from combinators in the late 80s, and has been the dominant
form since. Apparently the reason is that, on the current architecture
of most computers, combinator-based compilation targets just aren’t fast
enough. They generate too much garbage: as a result, switching to the
STG yielded about a 40% speedup.

A lot of this information comes from two talks, by the way:

- An Introduction to Combinator Compilers and Graph Reduction Machines, by David Graunke (2016), which goes through a high-level history and explanation of combinator compilers and why we switched away from them. A very interesting tidbit in this talk was that some people started making custom hardware to handle combinator calculi a little better. Even more interesting is the fact that these days we have FPGAs all over the place, so maybe combinator compilers are ripe for reintroduction?
- Combinators Revisited, by Edward Kmett (2018), which goes through a little more of the details of the problems with combinator compilers, and mentions some of the places in which we’re tantalisingly close to making combinator compilation work.

So compilation to combinators was once upon a time an extremely active area of research, but it has since fallen by the wayside a little because our current hardware is unable to evaluate it efficiently. What this means for us, though, is that there’s a large body of work on how to compile lambda terms to combinators!

We use the following basic combinator set for compilation:
`SKIBC`

. `S`

is really the most important one
here: of course we only need it and `K`

, but we use
`I`

because it dramatically simplifies the expressions we
generate, and we use `B`

and `C`

because they are
special cases of `S`

, as we’ll see in a second. The
translation works like so:

```
\x. e1 e2 -> S (\x. e1) (\x. e2)
\x. x -> I
\x. e -> K e
```

The translation works bottom-up. We’re only interested in removing the lambdas: combinator calculus does have application, after all, so there’s nothing we need to do in that case. For that reason, the algorithm is often called “abstraction elimination”, and it’s the one the pointfree.io uses to automatically pointfree Haskell expressions.

There are three forms of abstraction: abstraction into an expression
which is an application, abstraction which returns its argument, and
abstraction which returns something other than its argument. In the
first case, we use `S`

to pass the argument down each branch
of the abstraction. In the second, we just use `I`

. And in
the third case, we use `K`

to just ignore the argument. We
won’t ever get `\x. \y. e`

, since the algorithm works
bottom-up, so the `\y. e`

is eliminated before looking at the
`\x. \y. e`

.

`B`

and `C`

work like special cases of
`S`

: when we pass `x`

down both branches of the
application in the first case, sometimes that work is unnecessary.
Sometimes one of the branches doesn’t use the passed variable: in this
case, we use `B`

or `C`

, depending on which branch
ignores the variable.

```
\x. e1 e2, x ∉ e1 -> B e1 (\x. e2)
\x. e1 e2, x ∉ e2 -> C (\x. e1) e2
```

There is one issue with this approach: it produces combinator
expressions which are of order
$\mathcal{O}(n^3)$
larger than the corresponding lambda expression. With some tricks (like
our usage of `C`

and `B`

) we can get that down to
$\mathcal{O}(n^2)$,
but that’s still a pretty unpleasant size increase.

The issue is that we’re basically passing the arguments as a singly-linked list, where naive access is $\mathcal{O(n^2)}$, and more sophisticated access is $\mathcal{O}(n)$.

Oleg Kiselyov wrote a paper (2018) on getting this down to $\mathcal{O}(n)$, with some memoisation. There’s also a blog post (Lynn 2018), describing how to get that conversion without memoisation in $\mathcal{O}(n \log n)$ time, and an online implementation here.

That’s all for this post! I’ll probably write more about combinators in the future: they’re an extremely interesting subject, and a lot of fun as puzzles to mess around with. One thing that I haven’t mentioned is the connection between combinators and concatenative languages: it turns out that these two things are pretty much the same thing! Maybe I’ll look at it in a future post.

Graunke, David. 2016. “An Introduction to
Combinator Compilers and Graph Reduction
Machines.” St. Louis. https://www.youtube.com/watch?v=GawiQQCn3bk.

Kiselyov, Oleg. 2018.
“$\lambda$
to SKI, Semantically.” In
*Functional and Logic Programming*, ed by. John P.
Gallagher and Martin Sulzmann, 33–50. Lecture Notes in
Computer Science. Cham: Springer
International Publishing. doi:10.1007/978-3-319-90686-7_3.
http://okmij.org/ftp/tagless-final/ski.pdf.

Kmett, Edward. 2018. “Combinators Revisited.”
Wesley Conference Centre, Sydney, Australia. https://yowconference.com/talks/edward-kmett/yow-lambda-jam-2018/combinators-revisited-5919.

Lynn, Ben. 2018. “Ben Lynn’s Online
Garbage: Lambda the
Penultimate.” *Ben Lynn’s Online Garbage*. https://benlynn.blogspot.com/2018/11/lambda-penultimate_16.html.

Wildenhain, Tom. 2017. “On the Turing Completeness of
MS PowerPoint.” http://www.andrew.cmu.edu/user/twildenh/PowerPointTM/Paper.pdf.

If you want to look up these combinators elsewhere, this is the only one you won’t be able to find: it’s much less common than

`K`

, and where I have found it people just call it`K`

, so I had to pick a different letter to distinguish it↩︎

Tags: Haskell

It’s been a while since I last wrote a post (I’ve been busy with my Master’s thesis, which is nearly done), so I thought I would quickly throw out some fun snippets of Haskell I had reason to write over the past couple of weeks.

For some reason, until recently I had been under the impression that
it was impossible to fuse zips efficiently. In other words, I thought
that `zip`

was like `tail`

, in that if it was
implemented using only `foldr`

it would result in an
asymptotic slowdown (`tail`

is normally
$\mathcal{O}(1)$,
implemented as a fold it’s
$\mathcal{O}(n)$).

Well, it seems like this is not the case. The old zip-folding code I had looks to me now to be the correct complexity: it’s related to How To Zip Folds, by Oleg Kiselyov (although I’m using a different version of the function which can be found on the mailing list). The relevant code is as follows:

```
newtype Zip a b =
Zip { runZip :: a -> (Zip a b -> b) -> b }
zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr xf xb xs (Zip (foldr yf yb ys))
where
= runZip yk x xk
xf x xk yk = []
xb _
= (x,y) : xk (Zip yk)
yf y yk x xk = [] yb _ _
```

There are apparently reasons
for why the Prelude’s `zip`

isn’t allowed to fuse both of its
arguments: I don’t fully understand them, however. (in particular the
linked page says that the fused zip would have different strictness
behaviour, but the version I have above seems to function properly).

This version of zip leads to some more fun solutions to folding puzzles, like this one:

Write a function that is equivalent to:

`= reverse (zip (reverse xs) (reverse ys)) zipFromEnd xs ys`

Without creating any intermediate lists.

The desired function is interesting in that, instead of lining up
lists according to their first elements, it aligns them according to the
*ends*.

```
>>> zipFromEnd [1,2,3] "abc"
1,'a'),(2,'b'),(3,'c')]
[(
>>> zipFromEnd [1,2,3] "abcd"
1,'b'),(2,'c'),(3,'d')]
[(
>>> zipFromEnd [1,2,3,4] "abc"
2,'a'),(3,'b'),(4,'c')] [(
```

The solution here is just to use `foldl`

, and we get the
following:

```
zipFromEnd :: [a] -> [b] -> [(a,b)]
= foldl xf xb xs (Zip (foldl yf yb ys)) []
zipFromEnd xs ys where
= runZip yk x xk
xf xk x yk = zs
xb _ zs
= xk (Zip yk) ((x,y) : zs)
yf yk y x xk zs = zs yb _ _ zs
```

Another function which is a little interesting is the “zip longest” function:

```
zipLongest :: (a -> a -> a) -> [a] -> [a] -> [a]
= foldr xf xb xs (Zip (foldr yf yb ys))
zipLongest c xs ys where
= runZip yk (Just x) xk
xf x xk yk = runZip zs Nothing xb
xb zs
Nothing xk = y : xk (Zip yk)
yf y yk Just x) xk = c x y : xk (Zip yk)
yf y yk (
Nothing _ = []
yb Just x) zs = x : zs (Zip yb) yb (
```

Finally, all of these functions rely on the `Zip`

type,
which is *not* strictly positive. This means that we can’t use it
in Agda, and it’s tricky to reason about: I wonder what it is about
functions for deforestation that tends to lead to non-strictly-positive
datatypes.

The next puzzle I was interested in was finding the next lexicographic permutation of some string. In other words, given some string $s$, you need to find another string $t$ that is a permutation of $s$ such that $s < t$, and that there is no string $u$ that is a permutation of $s$ and $s < u < t$. The Wikipedia article on the topic is excellent (and clear), but again the algorithm is described in extremely imperative terms:

- Find the largest index k such that a[k] < a[k + 1]. If no such index exists, the permutation is the last permutation.
- Find the largest index l greater than k such that a[k] < a[l].
- Swap the value of a[k] with that of a[l].
- Reverse the sequence from a[k + 1] up to and including the final element a[n].

The challenge here is to write this algorithm without doing any indexing: indexing is expensive on Haskell lists, and regardless it is cleaner to express it without.

I managed to work out the following:

```
nextLexPerm :: Ord a => [a] -> Maybe [a]
= Nothing
nextLexPerm [] :xs) = go1 x xs
nextLexPerm (xwhere
= Nothing
go1 _ [] :xs) = maybe (go2 i j [] xs) (Just . (i:)) (go1 j xs)
go1 i (j
go2 i j xs ys| j <= i = Nothing
| otherwise = Just (fromMaybe (j : foldl (flip (:)) (i:xs) ys) (go3 i (j:xs) ys))
= Nothing
go3 _ _ [] :ys) = go2 i j xs ys go3 i xs (j
```

This comes from the Rosetta Code problem Circle Sort. This is a strange little sorting algorithm, where basically you compare elements on opposite sides of an array, swapping them as needed. The example given is the following:

`6 7 8 9 2 5 3 4 1`

First we compare (and swap) `6`

and `1`

, and
then `7`

and `4`

, and so on, until we reach the
middle. At this point we split the array in two and perform the
procedure on each half. After doing this once it is not the case that
the array is definitely sorted: you may have to repeat the procedure
several (but finitely many) times, until no swaps are performed.

I have absolutely no idea what the practical application for such an odd algorithm would be, but it seemed like an interesting challenge to try implement it in a functional style (i.e. without indices or mutation).

The first thing we have to do is fold the list in half, so we pair up the right items. We’ve actually seen an algorithm to do this before: it’s often called the “tortoise and the hare”, and our previous use was to check if a list was a palindrome. Here’s how we implement it:

```
halve :: [a] -> [(a,a)]
= snd (go xs xs)
halve xs where
:ys) (_:_:zs) = f y (go ys zs)
go (y:ys) [_] = (ys,[])
go (_= (ys,[])
go ys []
:ys,zs) = (ys, (x,y) : zs)
f x (y
>>> halve [6,7,8,9,2,5,3,4,1]
6,1),(7,4),(8,3),(9,5)] [(
```

Notice that the `2`

in the very middle of the list is
missing from the output: I’ll describe how to handle that element later
on. In the above piece of code, that `2`

actually gets bound
to the underscore (in `(_:ys)`

) in the second clause of
`go`

.

Next we need to do the actual swapping: this is actually pretty straightforward, if we think of the algorithm functionally, rather than imperatively. Instead of swapping things in place, we are building up both halves of the new list, so the “swap” operation should simply decide which list each item goes into.

```
halve :: Ord a => [a] -> ([a],[a])
= tl (go xs xs)
halve xs where
= (lte,gt)
tl (_,lte,gt)
:ys) (_:_:zs) = swap y (go ys zs)
go (y:ys) [_] = (ys,[],[])
go (_= (ys,[],[])
go ys []
:ys,lte,gt)
swap x (y| x <= y = (ys, x : lte, y : gt)
| otherwise = (ys, y : lte, x : gt)
```

At this point we can also see what to do with the middle item: we’ll put it in the higher or lower list, depending on a comparison with the element it’s next to.

```
halve :: Ord a => [a] -> ([a],[a])
= tl (go xs xs)
halve xs where
= (lte,gt)
tl (_,lte,gt)
:ys) (_:_:zs) = swap y (go ys zs)
go (y= (ys,[],[])
go ys [] :ys) [_] = (ys,[y | e],[y | not e])
go (ywhere e = y <= head ys
:ys,lte,gt)
swap x (y| x <= y = (ys, x : lte, y : gt)
| otherwise = (ys, y : lte, x : gt)
```

Next, we can use this as a helper function in the overall recursive function.

```
circleSort :: Ord a => [a] -> [a]
= []
circleSort [] = [x]
circleSort [x] =
circleSort xs let (lte,gt) = halve xs
in circleSort lte ++ circleSort (reverse gt)
```

This function isn’t correct (yet). As we mentioned already, we need to run the circle sort procedure multiple times until no swaps occur. We can add in the tracking of swaps like so:

```
circleSort :: Ord a => [a] -> [a]
= if swapped then circleSort ks else ks
circleSort xs where
= go xs
(swapped,ks)
= (False, [])
go [] = (False, [x])
go [x] =
go xs let (s,_,lte,gt) = halve xs xs
= go lte
(sl,lte') = go (reverse gt)
(sg,gt' ) in (s || sl || sg, lte' ++ gt')
:ys) (_:_:zs) = swap y (halve ys zs)
halve (y= (False,ys,[],[])
halve ys [] :ys) [_] = (False,ys,[y | e],[y | not e])
halve (ywhere e = y <= head ys
:ys,lte,gt)
swap x (s,y| x <= y = (s ,ys, x : lte, y : gt)
| otherwise = (True,ys, y : lte, x : gt)
```

So at this point we actually have a working implementation of the
function, which avoids indices as intended. It has some problems still,
though. First, we call `++`

, when we could be using
difference lists. Here’s the solution to that:

```
circleSort :: Ord a => [a] -> [a]
= if swapped then circleSort ks else ks
circleSort xs where
= go xs []
(swapped,ks)
= (False, zs)
go [] zs = (False, x:zs)
go [x] zs =
go xs zs let (s,_,lte,gt) = halve xs xs
= go lte gt'
(sl,lte') = go (reverse gt) zs
(sg,gt' ) in (s || sl || sg, lte')
:ys) (_:_:zs) = swap y (halve ys zs)
halve (y= (False,ys,[],[])
halve ys [] :ys) [_] = (False,ys,[y | e],[y | not e])
halve (ywhere e = y <= head ys
:ys,lte,gt)
swap x (s,y| x <= y = (s ,ys, x : lte, y : gt)
| otherwise = (True,ys, y : lte, x : gt)
```

Next we can actually rewrite the `go`

function to allow
for a certain amount of tail recursion (kind of):

```
circleSort :: Ord a => [a] -> [a]
= if swapped then circleSort ks else ks
circleSort xs where
= go xs (False,[])
(swapped,ks)
= (s,ks)
go [] (s,ks) = (s,x:ks)
go [x] (s,ks) =
go xs (s,ks) let (s',_,ls,rs) = halve s xs xs
in go ls (go (reverse rs) (s',ks))
:ys) (_:_:zs) = swap y (halve s ys zs)
halve s (y= (s,ys,[],[])
halve s ys [] :ys) [_] = (s,ys,[y | e],[y | not e])
halve s (ywhere e = y <= head ys
:ys,ls,rs)
swap x (s,y| x <= y = ( s,ys,x:ls,y:rs)
| otherwise = (True,ys,y:ls,x:rs)
```

Next, we call `reverse`

: but we can avoid the reverse by
passing a parameter which tells us which direction we’re walking down
the list. Since the swapping logic is symmetric, we’re able to just
invert some of the functions. It is a little tricky, though:

```
circleSort :: Ord a => [a] -> [a]
= if swapped then circleSort ks else ks
circleSort xs where
= go False xs (False,[])
(swapped,ks)
= (s,ks)
go d [] (s,ks) = (s,x:ks)
go d [x] (s,ks) =
go d xs (s,ks) let (s',_,ls,rs) = halve d s xs xs
in go False ls (go True rs (s',ks))
:ys) (_:_:zs) = swap d y (halve d s ys zs)
halve d s (y= (s,ys,[],[])
halve d s ys [] :ys) [_] = (s,ys,[y | e],[y | not e])
halve d s (ywhere e = y <= head ys
:ys,ls,rs)
swap d x (s,y| bool (<=) (<) d x y = ( d || s,ys,x:ls,y:rs)
| otherwise = (not d || s,ys,y:ls,x:rs)
```

So there it is! The one-pass, purely function implementation of circle sort. Very possibly the most useless piece of code I’ve ever written.

]]>
Tags: Haskell

A week or so ago I gave a presentation on purely functional data
structures as part of an interview^{1}. Here are the
slides:

https://doisinkidney.com/pdfs/purely-functional-data-structures-slides.pdf

The presentation is meant to be about 45 minutes long, and it’s aimed at end of first year computer science students who have done some Haskell and know a little bit about pointers.

The interview went well, by the way! All going well with my master’s I’ll be starting a PhD in Imperial in Nicolas Wu’s group this October.↩︎

Part 2 of a 2-part series on Random Access Lists

```
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilyDependencies #-}
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# OPTIONS_GHC -Wall -fno-warn-unticked-promoted-constructors #-}
module Post where
import Data.Kind
import Prelude hiding (lookup)
import GHC.TypeLits
import Control.Lens hiding (Cons, index)
```

One of the common techniques for building purely functional data structures is to base the structure on some numerical representation (Hinze 1998). Most recently, I read Swierstra (2020), where the binary numbers were used to implement a heterogeneous random-access list (effectively a generic tuple).

I’m going to look today at using the zeroless binary system to implement a similar structure, and see what the differences are.

I have talked about this representation before, so I won’t go into it
in huge depth, but put simply the zeroless binary system represents a
binary number as a string of `1`

s and `2`

s
(i.e. no zeroes). The vast majority of the normal binary operations
(addition, multiplication, etc.) can be implemented with the same broad
efficiency, but this system has one key advantage in that every single
number is uniquely represented. Since we’re going to use these numbers
to index our data types, this is actually extremely useful.

Before we get started, we’ll first define the peculiar type of lists we’re going to use.

```
infixr 5 :-
data Plus a = a :- Star a
data Star a
= Nil
| Some (Plus a)
```

`Star a`

is isomorphic to `[a]`

, so we’ve not
lost any expressive power or anything like that. The usefulness of this
definition is that we have a non-empty list type built in to our list
type, so we don’t have to do conversion back and forth which can be
cumbersome.

Next on to the number itself:

```
data Bit = B1 | B2
type family Inc (xs :: Star Bit) = (ys :: Plus Bit) | ys -> xs where
Inc Nil = B1 :- Nil
Inc (Some (B1 :- xs)) = B2 :- xs
Inc (Some (B2 :- xs)) = B1 :- Some (Inc xs)
```

We’re straight into the type-level operations here, and there’s an
interesting bit of syntax worth pointing out before we move on.
`ys -> xs`

is a type family dependency: it means that we
can uniquely determine `xs`

given `ys`

. This is
very handy for type inference and so on, and is perhaps the main benefit
of the zeroless binary numbers.

Next, we’ll build a tree indexed by these numbers. Now that we’re jumping in to indexing, we’ll need some singletons. Here’s my preferred way to do them:

```
type family The k = (s :: k -> Type) | s -> k
class Known (x :: a) where sing :: The a x
data SBit b where
SB1 :: SBit B1
SB2 :: SBit B2
type instance The Bit = SBit
instance Known B1 where sing = SB1
instance Known B2 where sing = SB2
```

The type family defines the singleton GADTs themselves. The class
`Known`

is for automatically generating singleton values.

On to the tree. We’re actually going to build a *Braun* tree
here, as they are actually particularly clean to implement on the type
level.

```
type family Carry (x :: Bit) (xs :: Star Bit) :: Star Bit where
Carry B1 xs = xs
Carry B2 xs = Some (Inc xs)
data Tree (xs :: Star Bit) a where
Leaf :: Tree Nil a
Branch :: Forest xs a -> Tree (Some xs) a
data Forest (xs :: Plus Bit) a where
Root :: a
-> The Bit x
-> Tree (Carry x xs) a
-> Tree xs a
-> Forest (x :- xs) a
```

We first have a type family which increments a binary number if its
first argument is `B2`

: this will maintain the Braun tree’s
invariant.

Next, we have the tree definition itself, which is split into two
mutual definitions, in much the same way as the `Star`

and
`Plus`

lists previously. Next, to `cons`

something
onto the tree:

```
type family Cons (x :: a) (xs :: Tree ns a) = (ys :: Forest (Inc ns) a) | ys -> x xs where
Cons x Leaf = Root x SB1 Leaf Leaf
Cons x (Branch (Root y SB1 ls rs)) = Root x SB2 (Branch (Cons y rs)) ls
Cons x (Branch (Root y SB2 ls rs)) = Root x SB1 (Branch (Cons y rs)) ls
```

You’ll notice that we can again annotate this type family with injectivity.

So far all we have is a size-indexed tree. We want a
*heterogeneous* tree, meaning that we must next construct a tree
*indexed* by the previous tree. In order to do this, we’ll first
need singletons on the type level:

```
type family Sing (x :: a) = (y :: The a x) | y -> x
type instance Sing B1 = SB1
type instance Sing B2 = SB2
```

This kind of nonsense we’re doing here is precisely the kind of thing obsolesced by dependent types, by the way. If you’re already doing type-level heavy stuff (as we are here) the extra power afforded by full dependent types often means that hacky special cases just turn into standard functions, greatly simplifying things like the above type families.

But anyway, back to the tree:

```
data HTree (xs :: Tree ns Type) where
HLeaf :: HTree Leaf
HNode :: x
-> !(The Bit b)
-> !(HTree ls)
-> !(HTree rs)
-> HTree (Branch (Root x (Sing b) ls rs))
```

And we can `cons`

on an element in much the same way we
did with the homogeneous tree:

```
infixr 5 <:
(<:) :: x -> HTree xs -> HTree (Branch (Cons x xs))
<: HLeaf = HNode x SB1 HLeaf HLeaf
x <: HNode y SB1 yl yr = HNode x SB2 (y <: yr) yl
x <: HNode y SB2 yl yr = HNode x SB1 (y <: yr) yl x
```

The real use of this data structure is quick *indexing*. As
with the previous functions, we will first need to construct the
type-level version of what we want to do.

```
type family Lookup (i :: Star Bit) (xs :: Tree sz a) :: a where
Lookup Nil (Branch (Root x _ _ _)) = x
Lookup (Some (B1 :- i)) (Branch (Root _ _ ls _)) = Lookup i ls
Lookup (Some (B2 :- i)) (Branch (Root _ _ _ rs)) = Lookup i rs
```

While this function is partial, the value-level one should not be: it should be provably in-bounds for lookups. As a result we’ll need a slightly complex type to represent the indices:

```
data Position (xs :: Star Bit) (ys :: Star Bit) where
P0 :: Position Nil (Some ys)
P1 :: !(Position xs (Carry y ys)) -> Position (Some (B1 :- xs)) (Some (y :- ys))
P2 :: !(Position xs ys) -> Position (Some (B2 :- xs)) (Some (y :- ys))
```

A value of type `Position xs ys`

is actually a proof that
`xs`

is smaller than `ys`

, but we’re using it here
just as a pointer to an entry in the tree. Here’s the actual lookup
function itself.

```
lookup :: forall is (ts :: Tree sz Type). Position is sz -> HTree ts -> Lookup is ts
lookup P0 (HNode x _ _ _) = x
lookup (P1 i) (HNode _ _ ls _) = lookup i ls
lookup (P2 i) (HNode _ _ _ rs) = lookup i rs
```

Just having pointers isn’t much use: we also need a way to build
them. The key function here is `push`

: this increments the
index pointed to by one.

```
infixr 5 ::-
data SPlus xs where
(::-) :: The a x -> The (Star a) xs -> SPlus (x :- xs)
data SStar xs where
Nily :: SStar Nil
Somy :: The (Plus a) xs -> SStar (Some xs)
type instance The (Plus a) = SPlus
type instance The (Star a) = SStar
instance Known Nil where sing = Nily
instance Known xs => Known (Some xs) where sing = Somy sing
instance (Known x, Known xs) => Known (x :- xs) where sing = sing ::- sing
```

```
push :: Known ys => Position xs ys -> Position (Some (Inc xs)) (Some (Inc ys))
= go p sing
push p where
go :: Position xs ys -> The (Star Bit) ys -> Position (Some (Inc xs)) (Some (Inc ys))
P0 (Somy (SB1 ::- _ )) = P1 P0
go P0 (Somy (SB2 ::- _ )) = P1 P0
go P2 i) (Somy (SB1 ::- ys)) = P1 (go i ys)
go (P2 i) (Somy (SB2 ::- ys)) = P1 (go i ys)
go (P1 i) (Somy (SB1 ::- _ )) = P2 i
go (P1 i) (Somy (SB2 ::- _ )) = P2 i go (
```

Everything above is pretty much all you need for many use cases, but it’s pretty ugly stuff. To actually use this thing as a generic tuple we’ll need a lot of quality-of-life improvements.

First of all, we should use type-level lists to indicate the tuple itself:

```
type family Length (xs :: [a]) :: Star Bit where
Length '[] = Nil
Length (_ : xs) = Some (Inc (Length xs))
type family FromList (xs :: [a]) = (ys :: Tree (Length xs) a) | ys -> xs where
FromList '[] = Leaf
FromList (x : xs) = Branch (Cons x (FromList xs))
```

```
type family Tuple (xs :: [Type]) = (ys :: Type) | ys -> xs where
Tuple xs = HTree (FromList xs)
```

Because the type family here is injective, we won’t get any of the
usual weird errors when we use the type `Tuple [Bool,String]`

or whatever: passing that around will function almost exactly the same
as passing around the tree representation itself directly.

```
example :: Tuple [Bool,String,Int,(),String]
= True <: "True" <: 1 <: () <: "T" <: HLeaf example
```

We can fold over the tree itself (using the Braun tree folding algorithm from a previous post) if every element in the tree conforms to some class. Using this we can generate a nice string representation of the tree.

`Show`

instance.
```
type family All (c :: a -> Constraint) (xs :: Tree ns a) :: Constraint where
All c Leaf = ()
All c (Branch (Root x _ ls rs)) = (c x, All c ls, All c rs)
newtype Q2 a
= Q2
unQ2 :: (Q2 a -> Q2 a) -> (Q2 a -> Q2 a) -> a
{
}
foldrTree :: forall c xs b. All c xs => (forall x. c x => x -> b -> b) -> b -> HTree xs -> b
= unQ2 (f @c g' n' t b) id id
foldrTree g' n' t where
f :: forall c' ys b'. All c' ys => (forall x. c' x => x -> b' -> b') -> b' -> HTree ys -> Q2 b' -> Q2 b'
HNode x _ l r) xs = Q2 (\ls rs -> g x (unQ2 xs (ls . f @c' g n l) (rs . f @c' g n r)))
f g n (HLeaf _ = Q2 (\_ _ -> n)
f _ n
= Q2 (\ls rs -> unQ2 (ls (rs b)) id id)
b
instance All Show xs => Show (HTree xs) where
showsPrec _ tr = showChar '(' . go (foldrTree @Show (\x xs -> shows x : xs) [] tr)
where
go :: [ShowS] -> ShowS
= showChar ')'
go [] :xs) = x . foldr (\y ys -> showChar ',' . y . ys) (showChar ')') xs go (x
```

```
>>> example
True,"True",1,(),"T") (
```

The approach used in Swierstra (2020) had a specific goal in mind: using the heterogeneous list to implement a lookup table for evaluating lambda calculus. As such, efficiently being able to “increment” an index was vital.

If we wanted to use the type as a generic tuple, though, we would have no such requirement. Instead, we might expect all accesses to be resolved and inlined at compile-time (as in Martinez, Viera, and Pardo 2013). We also would want a nice syntax for accessing parts of the tuple.

We can accomplish all of this with some type classes, as it happens. If we replace pattern-matching on data types with typeclass resolution we can be all but guaranteed that the function calls and so on will be inlined entirely at compile-time (we also would need to add INLINE pragmas to every instance, which I haven’t done here for readability’s sake). The main class we’ll use is the following:

```
class (xs :: Star Bit) < (ys :: Star Bit) where
pull :: forall (t :: Tree ys Type). HTree t -> Lookup xs t
```

```
instance Nil < Some ys where
HNode x _ _ _) = x
pull (
instance xs < ys => Some (B1 :- xs) < Some (B1 :- ys) where
HNode _ _ ls _) = pull @xs ls
pull (
instance xs < Some (Inc ys) => Some (B1 :- xs) < Some (B2 :- ys) where
HNode _ _ ls _) = pull @xs ls
pull (
instance xs < ys => Some (B2 :- xs) < Some (y :- ys) where
HNode _ _ _ rs) = pull @xs rs
pull (
instance TypeError (Text "Index out of range") => xs < Nil where
= error "unreachable"
pull
data Peano = Z | S Peano
type family FromPeano (n :: Peano) = (m :: Star Bit) | m -> n where
FromPeano Z = Nil
FromPeano (S n) = Some (Inc (FromPeano n))
type family FromLit (n :: Nat) :: Peano where
FromLit 0 = Z
FromLit n = S (FromLit (n - 1))
get :: forall n xs (t :: Tree xs Type). FromPeano (FromLit n) < xs
=> HTree t -> Lookup (FromPeano (FromLit n)) t
= pull @(FromPeano (FromLit n)) get
```

Some other details out of the way we get the following nice interface:

```
>>> get @4 example
"T"
```

You even get a type error for out-of-range indices:

`>>> get @7 example`

```
• Index out of range
• In the expression: get @7 example
```

Or we could even add a lens interface:

```
type family Replace (i :: Star Bit) (x :: a) (xs :: Tree sz a) :: Tree sz a where
Replace Nil x (Branch (Root _ b ls rs)) = Branch (Root x b ls rs)
Replace (Some (B1 :- i)) x (Branch (Root y b ls rs)) = Branch (Root y b (Replace i x ls) rs)
Replace (Some (B2 :- i)) x (Branch (Root y b ls rs)) = Branch (Root y b ls (Replace i x rs))
class (xs :: Star Bit) <! (ys :: Star Bit) where
index :: forall (t :: Tree ys Type) b. Lens (HTree t) (HTree (Replace xs b t)) (Lookup xs t) b
instance Nil <! Some ys where
index f (HNode x b ls rs) = fmap (\x' -> HNode x' b ls rs) (f x)
instance xs <! ys => Some (B1 :- xs) <! Some (B1 :- ys) where
index f (HNode x b ls rs) = fmap (\ls' -> HNode x b ls' rs) (index @xs f ls)
instance xs <! Some (Inc ys) => Some (B1 :- xs) <! Some (B2 :- ys) where
index f (HNode x b ls rs) = fmap (\ls' -> HNode x b ls' rs) (index @xs f ls)
instance xs <! ys => Some (B2 :- xs) <! Some (y :- ys) where
index f (HNode x b ls rs) = fmap (\rs' -> HNode x b ls rs') (index @xs f rs)
instance TypeError (Text "Index out of range") => xs <! Nil where
index = error "unreachable"
ind :: forall n xs (t :: Tree xs Type) a. FromPeano (FromLit n) <! xs
=> Lens (HTree t) (HTree (Replace (FromPeano (FromLit n)) a t)) (Lookup (FromPeano (FromLit n)) t) a
= index @(FromPeano (FromLit n)) ind
```

```
>>> over (ind @1) length example
True,4,1,(),"T") (
```

The approach I’ve taken here is actually a little unusual: in both
Hinze
(1998) and Swierstra
(2020) the tree is defined as a *nested* data type. Let’s
take a look at that approach, while also switching to Agda.

```
: Set
𝔹 = List Bool
𝔹
pattern 1ᵇ = false
pattern 2ᵇ = true
data When (A : Set a) : Bool → Set a where
: When A false
O⟨⟩ _⟩ : A → When A true
I⟨
infixl 4 _×2
record _×2 (A : Set a) : Set a where
constructor _,_
field
: A
fst snd open _×2
infixr 5 ⟨_⟩+_+2×_
data Array (A : Set a) : 𝔹 → Set a where
: Array A []
O _⟩+_+2×_ : ∀ {n ns} → A → When A n → Array (A ×2) ns → Array A (n ∷ ns) ⟨
```

The cons function here is really no more complex than the previous cons:

```
: 𝔹 → List Bool
inc = 1ᵇ ∷ []
inc [] (1ᵇ ∷ xs) = 2ᵇ ∷ xs
inc (2ᵇ ∷ xs) = 1ᵇ ∷ inc xs
inc
: ∀ {ns} → A → Array A ns → Array A (inc ns)
cons = ⟨ x ⟩+ O⟨⟩ +2× O
cons x O (⟨ x₂ ⟩+ O⟨⟩ +2× xs) = ⟨ x₁ ⟩+ I⟨ x₂ ⟩ +2× xs
cons x₁ (⟨ x₂ ⟩+ I⟨ x₃ ⟩ +2× xs) = ⟨ x₁ ⟩+ O⟨⟩ +2× cons (x₂ , x₃) xs cons x₁
```

But what I’m really interested in, again, is *indexing*. In
particular, I’m interested in using an actual binary number to index
into this structure, rather than the weird GADT we had to use in
Haskell. One of the advantages of using full dependent types is that we
can write functions like the following:

```
: ∀ is → Array A xs → is < xs → A
lookup = {!!} lookup
```

In other words, we can pass the proof term separately. This can help performance a little, but mainly it’s nice to use the actual number type one intended to use along with all of the functions we might use on that term.

So let’s get writing! The first thing to define is the proof of
`<`

. I’m going to define it in terms of a boolean function
on the bits themselves, i.e.:

```
_<ᴮ_ : 𝔹 → 𝔹 → Bool
_<ᴮ_ = {!!}
: Bool → Set
T = ⊤
T true = ⊥
T false
_<_ : 𝔹 → 𝔹 → Set
= T (x <ᴮ y) x < y
```

This will mean the proofs themselves are easy to pass around without
modification. In fact, we can go further and have the compiler
*definitionally* understand that the proof of
`x < y`

is proof irrelevant, with Agda’s `Prop`

.

```
record ⊤ : Prop where constructor tt
data ⊥ : Prop where
: Bool → Prop
T = ⊤
T true = ⊥
T false
_<_ : 𝔹 → 𝔹 → Prop
= T (x <ᴮ y) x < y
```

Next, the functions which compute the actual comparison.

```
_&_≲ᵇ_ : Bool → Bool → Bool → Bool
= s or y
s & false ≲ᵇ y = s and y
s & true ≲ᵇ y
_&_≲ᴮ_ : Bool → 𝔹 → 𝔹 → Bool
= s
s & [] ≲ᴮ [] (y ∷ ys) = true
s & [] ≲ᴮ (x ∷ xs) ≲ᴮ [] = false
s & (x ∷ xs) ≲ᴮ (y ∷ ys) = (s & x ≲ᵇ y) & xs ≲ᴮ ys
s &
_<ᴮ_ _≤ᴮ_ : 𝔹 → 𝔹 → Bool
_<ᴮ_ = false &_≲ᴮ_
_≤ᴮ_ = true &_≲ᴮ_
```

These functions combine the definitions of `≤`

and
`<`

, and do them both at once. We pass whether the
comparison is non-strict or not as the first parameter: this is worth
doing since both `<`

and `≤`

can be defined in
terms of each other:

```
(1ᵇ ∷ xs) < (2ᵇ ∷ ys) = xs ≤ ys
(2ᵇ ∷ xs) ≤ (1ᵇ ∷ ys) = xs < ys
...
```

Finally the function itself:

```
: ∀ {b} → When A b → A ×2 → A
sel-bit {b = 1ᵇ} _ = snd
sel-bit {b = 2ᵇ} _ = fst
sel-bit
mutual
: ∀ xs {ys} → Array A ys → xs < ys → A
index (⟨ x ⟩+ _ +2× _ ) p = x
index [] (1ᵇ ∷ is) (⟨ _ ⟩+ x +2× xs) p = index₂ is x xs p
index (2ᵇ ∷ is) (⟨ _ ⟩+ x +2× xs) p = sel-bit x (index is xs p)
index
: ∀ xs {y ys} → When A y → Array (A ×2) ys → 1ᵇ ∷ xs < y ∷ ys → A
index₂ = fst (index is xs p)
index₂ is O⟨⟩ xs p = x
index₂ [] I⟨ x ⟩ xs p (i ∷ is) I⟨ _ ⟩ xs p = snd (index₃ i is xs p)
index₂
: ∀ x xs {ys} → Array A ys → x ∷ xs ≤ ys → A
index₃ (⟨ _ ⟩+ x +2× xs) p = index₂ is x xs p
index₃ 2ᵇ is (⟨ x ⟩+ _ +2× _ ) p = x
index₃ 1ᵇ [] (i ∷ is) (⟨ _ ⟩+ x +2× xs) p = sel-bit x (index₃ i is xs p) index₃ 1ᵇ
```

I think Braun trees are a fascinating data structure with lots of interesting aspects. In practice they tend to be much slower than other comparable structures, but they’re extremely simple and have many properties which make them particularly well-suited to type-level programming.

Hinze, Ralf. 1998. *Numerical Representations as
Higher-Order Nested Datatypes*.
Institut für Informatik III, Universität
Bonn.

Martinez, Bruno, Marcos Viera, and Alberto Pardo. 2013. “Just do
it while compiling!: Fast extensible records in haskell.” In
*Proceedings of the ACM SIGPLAN 2013 workshop on
Partial evaluation and program manipulation -
PEPM ’13*, 77. Rome, Italy: ACM
Press. doi:10.1145/2426890.2426908.

Swierstra, Wouter. 2020. “Heterogeneous binary random-access
lists.” *Journal of Functional Programming* 30: e10.
doi:10.1017/S0956796820000064.

Part 9 of a 10-part series on Breadth-First Traversals

Tags: Haskell

This post will be quite light on details: I’m trying to gather up all of the material in this series to be a chapter in my Master’s thesis, so I’m going to leave the heavy-duty explanations and theory for that. Once finished I will probably do a short write up on this blog.

That said, the reason I’m writing this post is that in writing my thesis I figured out a nice way to solve the problem I first wrote about in this post. I won’t restate it in its entirety, but basically we’re looking for a function with the following signature:

`bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)`

Seasoned Haskellers will recognise it as a “traversal”. However, this
shouldn’t be an ordinary traversal: that, after all, can be derived
automatically by the compiler these days. Instead, the Applicative
effects should be evaluated in *breadth-first* order. To put it
another way, if we have a function which lists the elements of a tree in
breadth-first order:

`bfs :: Tree a -> [a]`

Then we should have the following identity:

`-> ([x], x)) t = (bfs t, t) bft (\x `

Using the writer Applicative with the list monoid here as a way to talk about ordering of effects.

There are many solutions to the puzzle (see Gibbons 2015; or Easterly 2019, or any of the posts in this series), but I had found them mostly unsatisfying. They basically relied on enumerating the tree in breadth-first order, running the traversal on the intermediate list, and then rebuilding the tree. It has the correct time complexity and so on, but it would be nice to deforest the intermediate structure a little bit more.

Anyways, the function I finally managed to get is the following:

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
:& xs) = liftA2 (:&) (f x) (bftF f xs)
bft f (x
bftF :: Applicative f => (a -> f b) -> [Tree a] -> f [Tree b]
= fmap head . foldr (<*>) (pure []) . foldr f [pure ([]:)]
bftF t where
:& xs) (q : qs) = liftA2 c (t x) q : foldr f (p qs) xs
f (x
= [pure ([]:)]
p [] :xs) = fmap (([]:).) x : xs
p (x
: ks) = ((x :& xs) : y) : ys
c x k (xs where (y : ys) = k ks
```

The `Tree`

is
defined like so:

`data Tree a = a :& [Tree a]`

It has all the right properties (complexity, etc.), and if you stick tildes before every irrefutable pattern-match it is also maximally lazy.

As a bonus, here’s another small function I looked at for my thesis. It performs a topological sort of a graph.

```
type Graph a = a -> [a]
topoSort :: Ord a => Graph a -> [a] -> [a]
= fst . foldr f ([], ∅)
topoSort g where
f x (xs,s) | x ∈ s = (xs,s)
| x ∉ s = first (x:) (foldr f (xs, {x} ∪ s) (g x))
```

Easterly, Noah. 2019. “Functions and newtype wrappers for
traversing Trees: Rampion/tree-traversals.” https://github.com/rampion/tree-traversals.

Gibbons, Jeremy. 2015. “Breadth-First
Traversal.” *Patterns in Functional Programming*.
https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Tags: Haskell

Just a short one again today!

There’s an excellent talk by Kenneth Foner at Compose from 2016 which goes through a paper by Danvy and Goldberg (2005) called “There and Back Again” (or TABA). You should watch the talk and read the paper if you’re in any way excited by the weird and wonderful algorithms we use in functional languages to do simple things like reversing a list.

The function focused on in the paper is one which does the following:

```
zipRev :: [a] -> [b] -> [(a,b)]
= zip xs (reverse ys) zipRev xs ys
```

But does it in one pass, *without* reversing the second list.
It uses a not-insignificant bit of cleverness to do it, but you can
actually arrive at the same solution in a pretty straightforward way by
aggressively converting everything you can to a fold. The result is the
following:

```
zipRev :: [a] -> [b] -> [(a,b)]
= foldl f b ys xs
zipRev xs ys where
= []
b _ :xs) = (x,y) : k xs f k y (x
```

I have written a little more on this function and the general technique before.

The talk goes through the same stuff, but takes a turn then to proving the function total: our version above won’t work correctly if the lists don’t have the same length, so it would be nice to provide that guarantee in the types somehow. Directly translating the version from the TABA paper into one which uses length-indexed vectors will require some nasty, expensive proofs, though, which end up making the whole function quadratic. The solution in the talk is to call out to an external solver which gives some extremely slick proofs (and a very nice interface). However, yesterday I realised you needn’t use a solver at all: you can type the Haskell version just fine, and you don’t even need the fanciest of type-level features.

As ever, the solution is another fold. To demonstrate this rather short solution, we’ll first need the regular toolbox of types:

```
data Nat = Z | S Nat
data Vec (a :: Type) (n :: Nat) where
Nil :: Vec a Z
(:-) :: a -> Vec a n -> Vec a (S n)
```

And now we will write a length-indexed left fold on this vector. The key trick here is that the type passed in the recursive call changes, by composition:

`newtype (:.:) (f :: b -> Type) (g :: a -> b) (x :: a) = Comp { unComp :: f (g x) }`

Safe coercions will let us use the above type safely without a performance hit, resulting in the following linear-time function:

```
foldlVec :: forall a b n. (forall m. a -> b m -> b (S m)) -> b Z -> Vec a n -> b n
Nil = b
foldlVec f b :- xs) = unComp (foldlVec (c f) (Comp (f x b)) xs)
foldlVec f b (x where
c :: (a -> b (S m) -> b (S (S m))) -> (a -> (b :.: S) m -> (b :.: S) (S m))
= coerce
c {-# INLINE c #-}
```

We can use this function to write vector reverse:

```
reverseVec :: Vec a n -> Vec a n
= foldlVec (:-) Nil reverseVec
```

Now, to write the reversing zip, we need another newtype to put the parameter in the right place, but it is straightforward other than that.

```
newtype VecCont a b n = VecCont { runVecCont :: Vec a n -> Vec (a,b) n }
revZip :: Vec a n -> Vec b n -> Vec (a,b) n
= flip $ runVecCont .
revZip
foldlVec-> VecCont (\(x :- xs) -> (x,y) :- runVecCont k xs))
(\y k VecCont (const Nil)) (
```

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back
Again.” *Fundamenta Informaticae* 66 (4)
(December): 397–413. https://cs.au.dk/~danvy/DSc/08_danvy-goldberg_fi-2005.pdf.

Part 8 of a 10-part series on Breadth-First Traversals

{-# OPTIONS --cubical --sized-types #-} module Post where open import ../code/terminating-tricky-traversals/Post.Prelude

Just a short one today. I’m going to look at a couple of algorithms for breadth-first traversals with complex termination proofs.

In a previous post I talked about breadth-first traversals over graphs, and the difficulties that cycles cause. Graphs are especially tricky to work with in a purely functional language, because so many of the basic algorithms are described in explicitly mututing terms (i.e. “mark off a node as you see it”), with no obvious immutable translation. The following is the last algoirthm I came up with:

```
bfs :: Ord a => (a -> [a]) -> a -> [[a]]
= takeWhile (not.null) (map fst (fix (f r . push)))
bfs g r where
= ([],Set.empty) : [ ([],seen) | (_,seen) <- xs ]
push xs @((l,s):qs)
f x q| Set.member x s = q
| otherwise = (x:l, Set.insert x s) : foldr f qs (g x)
```

As difficult as it is to work with graphs in a pure functional
language, it’s even *more* difficult to work in a *total*
language, like Agda. Looking at the above function, there are several
bits that we can see right off the bat won’t translate over easily.
Let’s start with `fix`

.

We shouldn’t expect to be able to write `fix`

in Agda
as-is. Just look at its Haskell implementation:

```
fix :: (a -> a) -> a
= f (fix f) fix f
```

It’s obviously non total!

(this is actually a non-memoizing version of `fix`

, which
is different from the usual
one)

We can write a function *like* `fix`

, though, using
coinduction and sized types.

record Thunk (A : Size → Type a) (i : Size) : Type a where coinductive field force : ∀ {j : Size< i} → A j open Thunk public fix : (A : Size → Type a) → (∀ {i} → Thunk A i → A i) → ∀ {j} → A j fix A f = f λ where .force → fix A f

Coinductive types are the dual to inductive types. Totality-wise, a coinductive type must be “productive”; i.e. a coinductive list can be infinitely long, but it must be provably able to evaluate to a constructor (cons or nil) in finite time.

Sized types also help us out here: they’re quite subtle, and a little finicky to use occasionally, but they are invaluable when it comes to proving termination or productivity of complex (especially higher-order) functions. The canonical example is mapping over the following tree type:

module NonTerminating where data Tree (A : Type a) : Type a where _&_ : A → List (Tree A) → Tree A {-# TERMINATING #-} mapTree : (A → B) → Tree A → Tree B mapTree f (x & xs) = f x & map (mapTree f) xs

The compiler can’t tell that the recursive call in the
`mapTree`

function will only be called on subnodes of the
argument: it can’t tell that it’s structurally recursive, in other
words. Annoyingly, we can fix the problem by inlining
`map`

.

mutual mapTree′ : (A → B) → Tree A → Tree B mapTree′ f (x & xs) = f x & mapForest f xs mapForest : (A → B) → List (Tree A) → List (Tree B) mapForest f [] = [] mapForest f (x ∷ xs) = mapTree′ f x ∷ mapForest f xs

The other solution is to give the tree a size parameter. This way, all submodes of a given tree will have smaller sizes, which will give the compiler a finite descending chain condition it can use to prove termination.

data Tree (A : Type a) (i : Size) : Type a where _&_ : A → ∀ {j : Size< i} → List (Tree A j) → Tree A i mapTree : (A → B) → Tree A i → Tree B i mapTree f (x & xs) = f x & map (mapTree f) xs

So how do we use this stuff in our graph traversal? Well first we’ll need a coinductive Stream type:

record Stream (A : Type a) (i : Size) : Type a where coinductive field head : A tail : ∀ {j : Size< i} → Stream A j open Stream public smap : (A → B) → Stream A i → Stream B i smap f xs .head = f (xs .head) smap f xs .tail = smap f (xs .tail)

And then we can use it to write our breadth-first traversal.

bfs : ⦃ _ : IsDiscrete A ⦄ → (A → List A) → A → Stream (List A) i bfs g r = smap fst (fix (Stream _) (f r ∘ push)) where push : Thunk (Stream _) i → Stream _ i push xs .head = ([] , []) push xs .tail = smap (_,_ [] ∘ snd) (xs .force) f : _ → Stream _ i → Stream _ i f x qs with (x ∈? qs .head .snd) .does ... | true = qs ... | false = λ where .head → (x ∷ qs .head .fst , x ∷ qs .head .snd) .tail → foldr f (qs .tail) (g x)

How do we convert this to a list of lists? Well, for this condition we would actually need to prove that there are only finitely many elements in the graph. We could actually use Noetherian finiteness for this: though I have a working implementation, I’m still figuring out how to clean this up, so I will leave it for another post.

A recent paper (Nipkow and Sewell 2020) provided Coq
proofs for some algorithms on Braun trees (Okasaki 1997),
which prompted me to take a look at them again. This time, I came up
with an interesting linear-time `toList`

function, which
relies on the following peculiar type:

```
newtype Q2 a
= Q2
unQ2 :: (Q2 a -> Q2 a) -> (Q2 a -> Q2 a) -> a
{ }
```

Even after coming up with the type myself, I still can’t really make
heads nor tails of it. If I squint, it starts to look like some bizarre
church-encoded binary number (but I have to *really* squint). It
certainly seems related to corecursive queues (Smith 2009).

Anyway, we can use the type to write the following lovely
`toList`

function on a Braun tree.

```
toList :: Tree a -> [a]
= unQ2 (f t b) id id
toList t where
Node x l r) xs = Q2 (\ls rs -> x : unQ2 xs (ls . f l) (rs . f r))
f (Leaf xs = Q2 (\_ _ -> [])
f
= Q2 (\ls rs -> unQ2 (ls (rs b)) id id) b
```

So can we convert it to Agda?

Not really! As it turns out, this function is even more difficult to
implement than one might expect. We can’t even *write* the
`Q2`

type in Agda without getting in trouble.

{-# NO_POSITIVITY_CHECK #-} record Q2 (A : Type a) : Type a where inductive field q2 : (Q2 A → Q2 A) → (Q2 A → Q2 A) → A open Q2

`Q2`

isn’t strictly positive, unfortunately.

{-# TERMINATING #-} toList : Braun A → List A toList t = f t n .q2 id id where n : Q2 A n .q2 ls rs = ls (rs n) .q2 id id f : Braun A → Q2 (List A) → Q2 (List A) f leaf xs .q2 ls rs = [] f (node x l r) xs .q2 ls rs = x ∷ xs .q2 (ls ∘ f l) (rs ∘ f r)

Apparently this problem of strict positivity for breadth-first traversals has come up before: Berger, Matthes, and Setzer (2019); Hofmann (1993).

Update 31/01/2020

Daniel Peebles (@copumpkin on twitter) replied to my tweet about this post with the following:

Interesting! Curious

howyou came up with that weird type at the end. It doesn’t exactly feel like the first thing one might reach for and it would be interesting to see some writing on the thought process that led to it

So that’s what I’m going to add here!

Let’s take the Braun tree of the numbers 1 to 15:

```
┌8
┌4┤
│ └12
┌2┤
│ │ ┌10
│ └6┤
│ └14
1┤
│ ┌9
│ ┌5┤
│ │ └13
└3┤
│ ┌11
└7┤
└15
```

Doing a normal breadth-first traversal for the first two levels is
fine (1, 2, 3): it starts to fall apart at the third level (4, 6, 5, 7).
Here’s the way we should traverse it: “all of the left branches, and
then all of the right branches”. So, we will have a queue of trees. We
take the root element of each tree in the queue, and emit it, and then
we add all of the *left* children of the trees in the queue to
one queue, and then all the *right* children to another, and then
concatenate them into a new queue and we start again. We can stop
whenever we hit an empty tree because of the structure of the Braun
tree. Here’s an ascii diagram to show what’s going on:

```
┌8 | ┌8 | ┌8 | 8
┌4┤ | ┌4┤ | 4┤ |
│ └12 | │ └12 | └12 | 9
┌2┤ | 2┤ | |
│ │ ┌10 | │ ┌10 | ┌9 | 10
│ └6┤ | └6┤ | 5┤ |
│ └14 | └14 | └13 | 11
1┤ --> -----> -------->
│ ┌9 | ┌9 | ┌10 | 12
│ ┌5┤ | ┌5┤ | 6┤ |
│ │ └13 | │ └13 | └14 | 13
└3┤ | 3┤ | |
│ ┌11 | │ ┌11 | ┌11 | 14
└7┤ | └7┤ | 7┤ |
└15 | └15 | └15 | 15
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
```

If we want to do this in Haskell, we have a number of options for how we would represent queues: as ever, though, I much prefer to use vanilla lists and time the reversals so that they stay linear. Here’s what that looks like:

```
toList :: Tree a -> [a]
= f t b [] []
toList t where
Node x l r) xs ls rs = x : xs (l : ls) (r : rs)
f (Leaf _ _ _ = []
f
= foldr f b (reverse ls ++ reverse rs) [] [] b ls rs
```

Any place we see a `foldr`

being run after a reverse or a
concatenation, we know that we can remove a pass (in actual fact rewrite
rules will likely do this automatically for us).

```
toList :: Tree a -> [a]
= f b t [] []
toList t where
Node x l r) xs ls rs = x : xs (l : ls) (r : rs)
f (Leaf _ _ _ = []
f
= foldl (flip f) (foldl (flip f) b rs) ls [] [] b ls rs
```

Finally, since we’re building up the lists with `:`

(in a
linear way, i.e. we will not use the intermediate queues more than
once), and we’re immediately consuming them with a fold, we can deforest
the intermediate list, replacing every `:`

with
`f`

(actually, it’s a little more tricky than that, since we
replace the `:`

with the *reversed* version of
`f`

, i.e. the one you would pass to `foldr`

if you
wanted it to act like `foldl`

. This trick is explained in
more detail in this
post).

```
toList :: Tree a -> [a]
= f t b id id
toList t where
Node x l r) xs ls rs = x : xs (ls . f l) (rs . f r)
f (Leaf _ _ _ = []
f
= ls (rs b) id id b ls rs
```

Once you do that, however, you run into the “cannot construct the infinite type” error. To be precise:

`• Occurs check: cannot construct the infinite type: a3 ~ (a3 -> c0) -> (a3 -> c1) -> [a2]`

And this gives us the template for our newtype! It requires some trial and error, but you can see where some of the recursive calls are, and what you eventually get is the following:

```
newtype Q2 a
= Q2
unQ2 :: (Q2 a -> Q2 a) -> (Q2 a -> Q2 a) -> [a]
{ }
```

(You can remove the list type constructor at the end, I did as I
thought it made it slightly more general). And from there we get back to
the `toList`

function.

Berger, Ulrich, Ralph Matthes, and Anton Setzer. 2019. “Martin
Hofmann’s Case for
Non-Strictly Positive Data Types.” In
*24th international conference on types for proofs and programs
(TYPES 2018)*, ed by. Peter Dybjer, José Espírito
Santo, and Luís Pinto, 130:22. Leibniz international proceedings in
informatics (LIPIcs). Dagstuhl, Germany:
Schloss DagstuhlLeibniz-Zentrum fuer Informatik. doi:10.4230/LIPIcs.TYPES.2018.1.
http://drops.dagstuhl.de/opus/volltexte/2019/11405.

Hofmann, Martin. 1993. “Non Strictly Positive
Datatypes in System F.” https://www.seas.upenn.edu/~sweirich/types/archive/1993/msg00027.html.

Nipkow, Tobias, and Thomas Sewell. 2020. “Proof pearl:
Braun trees.” In *Certified programs and proofs,
CPP 2020*, ed by. J. Blanchette and C. Hritcu, –.
ACM. http://www21.in.tum.de/~nipkow/pubs/cpp20.html.

Okasaki, Chris. 1997. “Three Algorithms on
Braun Trees.” *Journal of Functional
Programming* 7 (6) (November): 661–666. doi:10.1017/S0956796897002876.
https://www.eecs.northwestern.edu/~robby/courses/395-495-2013-fall/three-algorithms-on-braun-trees.pdf.

Smith, Leon P. 2009. “Lloyd Allison’s
Corecursive Queues: Why Continuations
Matter.” *The Monad.Reader* 14 (14) (July): 28. https://meldingmonads.files.wordpress.com/2009/06/corecqueues.pdf.

In dependently typed languages, it’s often important to figure out a good “low-level” representation for some concept. The natural numbers, for instance:

`data Nat = Z | S Nat`

For “real” applications, of course, these numbers are offensively
inefficient, in terms of both space and time. But that’s not what I’m
after here: I’m looking for a type which best describes the essence of
the natural numbers, and that can be used to prove and think about them.
In that sense, this representation is second to none: it’s basically the
simplest possible type which *can* represent the naturals.

Let’s nail down that idea a little better. What do we mean when a type is a “good” representation for some concept.

There should be no redundancy. The type for the natural numbers above has this property: every natural number as one (and only one) canonical representative in

`Nat`

. Compare that to the following possible representation for the integers:`data Int = Neg Nat | Pos Nat`

There are two ways to represent

`0`

here: as`Pos Z`

or`Neg Z`

.Of course, you can quotient out the redundancy in Cubical Agda, or normalise on construction every time, but either of these workarounds gets your representation a demerit.

Operations should be definable simply and directly on the representation. Points docked for converting to and from some non-normalised form.

That conversion, however, can exist, and ideally should exist, in some fundamental way. You should be able to establish an efficient isomorphism with other representations of the same concept.

Properties about the type should correspond to intuitive properties about the representation. For

`Nat`

above, this means things like order: the usual order on the natural numbers again has a straightforward analogue on`Nat`

.

With that laundry list of requirements, it’s no wonder that it’s often tricky to figure out the “right” type for a concept.

In this post, I’m going to talk about a type for the rational numbers, and I’m going to try satisfy those requirements as best I can.

Our first attempt at representing the rationals might use a fraction:

`data Frac = Integer :/ Integer`

This obviously fails the redundancy property. The fractions $\frac{1}{2}$ and $\frac{2}{4}$ represent the same number, but have different underlying values.

So the type isn’t suitable as a potential representation for the rationals. That’s not to say that this type is useless: far from it! Indeed, Haskell’s Data.Ratio uses something quite like this to implement rationals.

If you’re going to deal with redundant elements, there are two broad ways to deal with it. Data.Ratio’s approach is to normalise on construction, and only export a constructor which does this. This gives you a pretty good guarantee that there won’t be any unreduced fractions lying around in you program. Agda’s standard library also uses an approach like this, although the fact that the numerator and denominator are coprime is statically verified by way of a proof carried in the type.

The other way to deal with redundancy is by quotient. In Haskell, that kind of means doing the following:

```
instance Eq Frac where
:/ xd) == (y :/ yd) = (x * yd) == (y * xd)
(x
instance Ord Frac where
compare (x :/ xd) (y :/ yd) = compare (x * yd) (y * xd)
```

We don’t have real quotient types in Haskell, but this gets the idea
across: we haven’t normalised our representation internally, but as far
as anyone *using* the type is concerned, they shouldn’t be able
to tell the difference between
$\frac{1}{2}$
and
$\frac{2}{4}$.

Th `Num`

instance is pretty much just a restating of the
axioms for fractions.

`Num`

instance for `Frac`

.
```
instance Num Frac where
fromInteger n = n :/ 1
:/ xd) * (y :/ yd) = (x * y) :/ (xd * yd)
(x :/ xd) + (y :/ yd) = (x * yd + y * xd) :/ (xd * yd)
(x signum (n :/ d) = signum (n * d) :/ 1
abs n = signum n * n
:/ xd) - (y :/ yd) = (x * yd - y * xd) :/ (xd * yd) (x
```

Cubical Agda, of course, *does* have real quotient types.
There, the `Eq`

instance becomes a path constructor.

```
data ℚ : Type₀ where
_÷_ : (n d : ℕ) → ℚ
: ∀ xⁿ xᵈ yⁿ yᵈ →
reduce →
xⁿ ℕ* yᵈ ≡ yⁿ ℕ* xᵈ xⁿ ÷ xᵈ ≡ yⁿ ÷ yᵈ
```

But we’ll leave the Agda stuff for another post.

Now we get to the cool stuff. To reduce a fraction, we usually do something like getting the greatest common divisor of each operand. One nice way to do that is to use Euclid’s algorithm:

```
gcd :: Natural -> Natural -> Natural
gcd n m = case compare n m of
EQ -> n
LT -> gcd n (m - n)
GT -> gcd (n - m) m
```

Let’s run that function on three different inputs: $\frac{2}{3}$, $\frac{4}{6}$, and $\frac{5}{6}$.

```
gcd 2 3 => case compare 2 3 of
LT -> gcd 2 (3 - 2) => case compare 2 1 of
GT -> gcd (2 - 1) 1 => case compare 1 1 of
EQ -> 1
gcd 4 6 => case compare 4 6 of
LT -> gcd 4 (6 - 4) => case compare 4 2 of
GT -> gcd (4 - 2) 2 => case compare 2 2 of
EQ -> 2
gcd 5 6 => case compare 5 6 of
LT -> gcd 5 (6 - 5) => case compare 5 1 of
GT -> gcd (5 - 1) 1 => case compare 4 1 of
GT -> gcd (4 - 1) 1 => case compare 3 1 of
GT -> gcd (3 - 1) 1 => case compare 2 1 of
GT -> gcd (2 - 1) 1 => case compare 1 1 of
EQ -> 1
```

Those all return the right things, but that’s not what’s interesting
here: look at the chain of comparison results. For the two fractions
which are equivalent, their *chains* are equal.

This turns out to hold in general. Every rational number can be (uniquely!) represented as a list of bits, where each bit is a comparison result from Euclid’s algorithm.

```
data Bit = O | I
type Rational = [Bit]
abs :: Frac -> Rational
abs = unfoldr f
where
:/ d) = case compare n d of
f (n EQ -> Nothing
LT -> Just (O, n :/ (d - n))
GT -> Just (I, (n - d) :/ d)
```

And since we used `unfoldr`

, it’s easy to reverse the
algorithm to convert from the representation to a pair of numbers.

```
rep :: Rational -> Frac
= foldr f (1 :/ 1)
rep where
I (n :/ d) = (n + d) :/ d
f O (n :/ d) = n :/ (n + d) f
```

Now `abs . rep`

is the identity function, and
`rep . abs`

reduces a fraction! We have identified an
isomorphism between our type (a list of bits) and the rational
numbers!

Well, between the positive rational numbers. Not to worry: we can add
a sign before it. And, because our type doesn’t actually include 0, we
don’t get the duplicate 0 problems we did with `Int`

.

```
data Q
= Neg Rational
| Zero
| Pos Rational
```

We can also define some operations on the type, by converting back and forth.

```
instance Num Rational where
fromInteger n = abs (n :/ 1)
+ ys = abs (rep xs + rep ys)
xs * ys = abs (rep xs * rep ys)
xs - ys = abs (rep xs - rep ys) xs
```

So we have a construction that has our desired property of canonicity. Even better, there’s a reasonably efficient algorithm to convert to and from it! Our next task will be examining the representation itself, and seeing what information we can get from it.

To do so we’ll turn to the subject of the title of this post: the Stern-Brocot tree.

This tree, pictured above, has some incredible properties:

- It contains every rational number (in reduced form) exactly once.
- It is a binary search tree.

Both of these properties make it an excellent candidate for basing a
representation on. As it turns out, that’s what we already did! Our list
of bits above is precisely a path into the Stern-Brocot tree, where
every `O`

is a left turn and every `I`

right.

The most important fact we’ve gleaned so far from the Stern-Brocot
tree is that our representation is lexicographically ordered. While that
may not seem like much, it turns our list of bits into a
progressively-narrowing interval, which generates more and more accurate
estimates of the true value. When we see a `O`

at the head of
the list, we know that the result must be smaller than `1`

;
what follows will tell us on what side of
$\frac{1}{2}$
the answer lies, and so on.

This turns out to be quite a useful property: we often don’t need
*exact* precision for some calculation, but rather some
approximate answer. It’s even rarer still that we know exactly how much
precision we need for a given expression (which is what floating point
demands). Usually, the precision we need changes quite dynamically. If a
particular number plays a more influential role in some expression, for
instance, its precision is more important than the others!

By producing a lazy list of bits, however, we can allow the
*consumer* to specify the precision they need, by demanding those
bits as they go along. (In the literature, this kind of thing is
referred to as “lazy exact arithmetic”, and it’s quite fascinating. The
representation presented here, however, is not very suitable for any
real computation: it’s incredibly slow. There is a paper on the topic:
Niqui
(2007), which examines the Stern-Brocot numbers in Coq).

In proofs, the benefit is even more pronounced: finding out that a number is in a given range by just inspecting the first element of the list gives an excellent recursion strategy. We can do case analysis on: “what if it’s 1”, “what if it’s less than 1”, and “what if it’s greater than 1”, which is quite intuitive.

There’s one problem: our evaluation function is defined as a
`foldr`

, and forces the accumulator at every step. We will
need to figure out another evaluator which folds from the left.

So let’s look more at the “interval” interpretation of the Stern-Brocot tree. The first interval is $\left(\frac{0}{1},\frac{1}{0}\right)$: neither of these values are actually members of the type, which is why we’re not breaking any major rules with the $\frac{1}{0}$. To move left (to $\frac{1}{2}$ in the diagram), we need to use a peculiar operation called “child’s addition”, often denoted with a $\oplus$.

$\frac{a}{b} \oplus \frac{c}{d} = \frac{a+c}{b+d}$

The name comes from the fact that it’s a very common mistaken definition of addition on fractions.

Right, next steps: to move *left* in an interval, we do the
following:

$\text{left} \left(\mathit{lb},\mathit{ub} \right) = \left( \mathit{lb}, \mathit{lb} \oplus \mathit{ub} \right)$

In other words, we narrow the right-hand-side of the interval. To move right is the opposite:

$\text{right} \left(\mathit{lb},\mathit{ub} \right) = \left( \mathit{lb} \oplus \mathit{ub} , \mathit{ub} \right)$

And finally, when we hit the end of the sequence, we take the
*mediant* value.

$\text{mediant}\left(\mathit{lb} , \mathit{ub}\right) = \mathit{lb} \oplus \mathit{rb}$

From this, we get a straightforward left fold which can compute our fraction.

```
6 :-:
infix data Interval
= (:-:)
lb :: Frac
{ ub :: Frac
,
}
mediant :: Interval -> Frac
:/ d :-: a :/ c) = (a+b) :/ (c+d)
mediant (b
right :: Interval -> Interval
left,= lb x :-: mediant x
left x = mediant x :-: ub x
right x
rep' :: [Bit] -> Frac
= mediant . foldl f ((0 :/ 1) :-: (1 :/ 0))
rep' where
I = right a
f a O = left a f a
```

Before diving in and using this new evaluator to incrementalise our functions, let’s take a look at what’s going on behind the scenes of the “interval narrowing” idea.

It turns out that the “interval” is really a $2\times2$ square matrix in disguise (albeit a little reordered).

$\left( \frac{a}{b} , \frac{c}{d} \right) = \left( \begin{matrix} c & a \\ d & b \end{matrix} \right)$

Seen in this way, the beginning interval—$\left(\frac{0}{1} , \frac{1}{0}\right)$—is actually the identity matrix. Also, the two values in the second row of the tree correspond to special matrices which we will refer to as $L$ and $R$.

$L = \left( \begin{matrix} 1 & 0 \\ 1 & 1 \end{matrix} \right) \; R = \left( \begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix} \right)$

It turns out that the left and right functions we defined earlier correspond to multiplication by these matrices.

$\text{left}(x) = xL$ $\text{right}(x) = xR$

Since matrix multiplication is associative, what we have here is a
monoid. `mempty`

is the open interval at the beginning, and
`mappend`

is matrix multiplication. This is the property that
lets us incrementalise the whole thing, by the way: associativity allows
us to decide when to start and stop the calculation.

We now have all the parts we need. First, we will write an evaluator
that returns increasingly precise intervals. Our friend
`scanl`

fits the requirement precisely.

```
approximate :: [Bit] -> [Interval]
= scanl f mempty
approximate where
I = right i
f i O = left i f i
```

Next, we will need to combine two of these lists with some operation on fractions.

```
interleave :: (Frac -> Frac -> Frac)
-> [Interval]
-> [Interval]
-> [Interval]
*) [xi] ys = map (\y -> x * lb y :-: x * ub y) ys
interleave (where x = mediant xi
*) (x:xs) ys@(y:_) =
interleave (*) `on` lb) x y :-: ((*) `on` ub) x y) : interleave (*) ys xs (((
```

The operation must respect orders in the proper way for this to be valid.

This pops one bit from each list in turn: one of the many possible optimisations would be to pull more information from the more informative value, in some clever way.

Finally, we have a function which incrementally runs some binary operator lazily on a list of bits.

```
quad :: (Frac -> Frac -> Frac)
-> [Bit]
-> [Bit]
-> [Bit]
*) xs ys = foldr f (unfoldr p) zs mempty
quad (where
= (interleave (*) `on` approximate) xs ys
zs
f x xs c| mediant c < lb x = I : f x xs (right c)
| mediant c > ub x = O : f x xs (left c)
| otherwise = xs c
= mediant (last zs)
t
= case compare (mediant c) t of
p c LT -> Just (I, right c)
GT -> Just (O, left c)
EQ -> Nothing
```

The function only ever inspects the next bit when it absolutely needs to.

The helper function `f`

here is the “incremental” version.
`p`

takes over when the precision of the input is
exhausted.

We can use this to write an addition function (with some added special cases to speed things up).

```
add :: [Bit] -> [Bit] -> [Bit]
= I : ys
add [] ys = I : xs
add xs [] I:xs) ys = I : add xs ys
add (I:ys) = I : add xs ys
add xs (= quad (+) xs ys add xs ys
```

We (could) also try and optimise the times we look for a new bit.
Above we have noticed every case where one of the rationals is preceded
by a whole part. After you encounter two `O`

s, in addition if
the two strings are inverses of each other the result will be 1.
i.e. `OOIOOI`

+ `OIOIIO`

=
$\frac{1}{1}$.
We could try and spot this, only testing with comparison of the mediant
when the bits are the same. You’ve doubtless spotted some other possible
optimisations: I have yet to look into them!

One of the other applications of lazy rationals is that they can
begin to *look* like the real numbers. For instance, the
`p`

helper function above is basically defined extensionally.
Instead of stating the value of the number, we give a function which
tells us when we’ve made something too big or too small (which sounds an
awful lot like a Dedekind cut to my ears). Here’s a function which
*inverts* a given function on fractions, for instance.

```
inv :: (Frac -> Frac) -> [Bit] -> [Bit]
= unfoldr f mempty
inv o n where
= fromQ n
t
= case compare (o (mediant c)) t of
f c LT -> Just (I, right c)
GT -> Just (O, left c)
EQ -> Nothing
```

Of course, the function has to satisfy all kinds of extra properties that I haven’t really thought a lot about yet, but no matter. We can use it to invert a squaring function:

```
sqrt :: [Bit] -> [Bit]
sqrt = inv (\x -> x * x)
```

And we can use *this* to get successive approximations to
$\sqrt{2}$!

```
root2Approx= map (toDouble . mediant) (approximate (sqrt (abs (2 :/ 1))))
>>> mapM_ print root2Approx
1.0
2.0
1.5
1.3333333333333333
1.4
1.4285714285714286
1.4166666666666667
1.411764705882353
1.4137931034482758
1.4146341463414633
...
```

Using the Stern-Brocot tree to represent the rationals was formalised in Coq in Bertot (2003). The corresponding lazy operations are formalised in QArith. Its theory and implementation is described in Niqui (2007). Unfortunately, I found most of the algorithms impenetrably complex, so I can’t really judge how they compare to the ones I have here.

I mentioned that one of the reasons you might want lazy rational arithmetic is that it can help with certain proofs. While this is true, in general the two main reasons people reach for lazy arithmetic is efficiency and as a way to get to the real numbers.

From the perspective of efficiency, the Stern-Brocot tree is probably a bad idea. You may have noticed that the right branch of the tree contains all the whole numbers: this means that the whole part is encoded in unary. Beyond that, we generally have to convert to some fraction in order to do any calculation, which is massively expensive.

The problem is that bits in the same position in different numbers don’t necessarily correspond to the same quantities. In base 10, for instance, the numbers 561 and 1024 have values in the “ones” position of 1 and 4, respectively. We can work with those two values independent of the rest of the number, which can lead to quicker algorithms.

Looking at the Stern-Brocot encoding, the numbers
$\frac{2}{3}$
and 3 are represented by `OI`

and `II`

,
respectively. That second `I`

in each, despite being in the
same position, corresponds to *different values*:
$\frac{1}{3}$
in the first, and
$\frac{3}{2}$
in the second.

Solutions to both of these problems necessitate losing the one-to-one property of the representation. We could improve the size of the representation of terms by having our $L$ and $R$ matrices be the following (Kůrka 2014):

$L = \left( \begin{matrix} 1 & 0 \\ 1 & 2 \end{matrix} \right) \; R = \left( \begin{matrix} 2 & 1 \\ 0 & 1 \end{matrix} \right)$

But now there will be gaps in the tree. This basically means we’ll have to use infinite repeating bits to represent terms like $\frac{1}{2}$.

We could solve the other problem by throwing out the Stern-Brocot tree entirely and using a more traditional positional number system. Again, this introduces redundancy: in order to represent some fraction which doesn’t divide properly into the base of the number system you have to use repeating decimals.

The second reason for lazy rational arithmetic is that it can be a crucial component in building a constructive interpretation of the real numbers. This in particular is an area of real excitement at the moment: HoTT has opened up some interesting avenues that weren’t possible before for constructing the reals (Bauer 2016).

In a future post, I might present a formalisation of these numbers in Agda. I also intend to look at the dyadic numbers.

Update 26/12/2019: thanks Anton Felix Lorenzen and Joseph C. Sible for spotting some mistakes in this post.

Bauer, Andrej. 2016. “The real numbers in homotopy type
theory.” Faro, Portugal. http://math.andrej.com/wp-content/uploads/2016/06/hott-reals-cca2016.pdf.

Bertot, Yves. 2003. “A simple canonical representation of rational
numbers.” *Electronic Notes in Theoretical Computer
Science* 85 (7). Mathematics, Logic and
Computation (Satellite Event of
ICALP 2003) (September): 1–16. doi:10.1016/S1571-0661(04)80754-0.
http://www.sciencedirect.com/science/article/pii/S1571066104807540.

Kůrka, Petr. 2014. “Exact real arithmetic for interval number
systems.” *Theoretical Computer Science* 542 (July):
32–43. doi:10.1016/j.tcs.2014.04.030.
http://www.sciencedirect.com/science/article/pii/S0304397514003351.

Niqui, Milad. 2007. “Exact arithmetic on the Stern
tree.” *Journal of Discrete Algorithms* 5 (2). 2004
Symposium on String Processing and
Information Retrieval (June): 356–379. doi:10.1016/j.jda.2005.03.007.
http://www.sciencedirect.com/science/article/pii/S1570866706000311.

Tags: Agda

{-# OPTIONS --safe --without-K #-} module Post where open import Data.Fin using (Fin; suc; zero; _≟_) open import Data.Nat using (ℕ; suc; zero; _+_; compare; equal; greater; less) open import Data.Nat.Properties using (+-comm) open import Data.Bool using (not; T) open import Relation.Nullary using (yes; no; does; ¬_) open import Data.Product using (Σ; Σ-syntax; proj₁; proj₂; _,_) open import Data.Unit using (tt; ⊤) open import Function using (_∘_; id; _⟨_⟩_) open import Relation.Binary.PropositionalEquality using (subst; trans; cong; sym; _≡_; refl; _≢_) open import Data.Empty using (⊥-elim; ⊥) variable n m : ℕ

Here’s a puzzle: can you prove that `Fin`

is injective?
That’s the type constructor, by the way, not the type itself. Here’s the
type of the proof we want:

Goal : Set₁ Goal = ∀ {n m} → Fin n ≡ Fin m → n ≡ m

I’m going to present a proof of this lemma that has a couple interesting features. You should try it yourself before reading on, though: it’s difficult, but great practice for understanding Agda’s type system.

First off, I should say that it’s not really a “new” proof: it’s
basically Andras
Kovac’s proof, with one key change. That proof, as well as this one,
goes `--without-K`

: because I actually use this proof in some
work I’m doing in Cubical Agda at the moment, this was non optional. It
does make things significantly harder, and disallows nice tricks like
the ones
used by effectfully.

The trick we’re going to use comes courtesy of James Wood. The central idea is the following type:

_≢ᶠ_ : Fin n → Fin n → Set x ≢ᶠ y = T (not (does (x ≟ y)))

This proof of inequality of `Fin`

s is different from the
usual definition, which might be something like:

_≢ᶠ′_ : Fin n → Fin n → Set x ≢ᶠ′ y = x ≡ y → ⊥

Our definition is based on the decidable equality of two
`Fin`

s. It also uses the standard library’s new
`Dec`

type. Basically, we get better computation behaviour
from our definition. It behaves as if it were defined like so:

_≢ᶠ″_ : Fin n → Fin n → Set zero ≢ᶠ″ zero = ⊥ zero ≢ᶠ″ suc y = ⊤ suc x ≢ᶠ″ zero = ⊤ suc x ≢ᶠ″ suc y = x ≢ᶠ″ y

The benefit of this, in contrast to `_≢ᶠ′_`

, is that each
case becomes a definitional equality we don’t have to prove. Compare the
two following proofs of congruence under `suc`

:

cong-suc″ : ∀ {x y : Fin n} → x ≢ᶠ″ y → suc x ≢ᶠ″ suc y cong-suc″ p = p cong-suc′ : ∀ {x y : Fin n} → x ≢ᶠ′ y → suc x ≢ᶠ′ suc y cong-suc′ {n = suc n} p q = p (cong fpred q) where fpred : Fin (suc (suc n)) → Fin (suc n) fpred (suc x) = x fpred zero = zero

First, we will describe an “injection” for functions from
`Fin`

s to `Fin`

s.

_F↣_ : ℕ → ℕ → Set n F↣ m = Σ[ f ∈ (Fin n → Fin m) ] ∀ {x y} → x ≢ᶠ y → f x ≢ᶠ f y

We’re using the negated from of injectivity here, which is usually
avoided in constructive settings. It actually works a little better for
us here, though. Since we’re working in the domain of `Fin`

s,
and since our proof is prop-valued, it’s almost like we’re working in
classical logic.

Next, we have the workhorse of the proof, the `shrink`

lemma:

shift : (x y : Fin (suc n)) → x ≢ᶠ y → Fin n shift zero (suc y) x≢y = y shift {suc _} (suc x) zero x≢y = zero shift {suc _} (suc x) (suc y) x≢y = suc (shift x y x≢y) shift-inj : ∀ (x y z : Fin (suc n)) y≢x z≢x → y ≢ᶠ z → shift x y y≢x ≢ᶠ shift x z z≢x shift-inj zero (suc y) (suc z) y≢x z≢x neq = neq shift-inj {suc _} (suc x) zero (suc z) y≢x z≢x neq = tt shift-inj {suc _} (suc x) (suc y) zero y≢x z≢x neq = tt shift-inj {suc _} (suc x) (suc y) (suc z) y≢x z≢x neq = shift-inj x y z y≢x z≢x neq shrink : suc n F↣ suc m → n F↣ m shrink (f , inj) .proj₁ x = shift (f zero) (f (suc x)) (inj tt) shrink (f , inj) .proj₂ p = shift-inj (f zero) (f (suc _)) (f (suc _)) (inj tt) (inj tt) (inj p)

This will give us the inductive step for the overall proof. Notice
the absence of any `cong`

s or the like: the computation
behaviour of `≢ᶠ`

saves us on that particular front. Also we
don’t have to use `⊥-elim`

at any point: again, because of
the computation behaviour of `≢ᶠ`

, Agda knows that certain
cases are unreachable, so we don’t even have to define them.

Next, we derive the proof that a `Fin`

cannot inject into
a smaller `Fin`

.

¬plus-inj : ∀ n m → ¬ (suc (n + m) F↣ m) ¬plus-inj zero (suc m) inj = ¬plus-inj zero m (shrink inj) ¬plus-inj (suc n) m (f , inj) = ¬plus-inj n m (f ∘ suc , inj) ¬plus-inj zero zero (f , _) with f zero ... | ()

That’s actually the bulk of the proof done: the rest is Lego, joining up the pieces and types. First, we give the normal definition of injectivity:

Injective : ∀ {a b} {A : Set a} {B : Set b} → (A → B) → Set _ Injective f = ∀ {x y} → f x ≡ f y → x ≡ y _↣_ : ∀ {a b} → Set a → Set b → Set _ A ↣ B = Σ (A → B) Injective

Then we convert from one to the other:

toFin-inj : (Fin n ↣ Fin m) → n F↣ m toFin-inj f .proj₁ = f .proj₁ toFin-inj (f , inj) .proj₂ {x} {y} x≢ᶠy with x ≟ y | f x ≟ f y ... | no ¬p | yes p = ¬p (inj p) ... | no _ | no _ = tt

And finally we have our proof:

n≢sn+m : ∀ n m → Fin n ≢ Fin (suc (n + m)) n≢sn+m n m n≡m = ¬plus-inj m n (toFin-inj (subst (_↣ Fin n) (n≡m ⟨ trans ⟩ cong (Fin ∘ suc) (+-comm n m)) (id , id))) Fin-inj : Injective Fin Fin-inj {n} {m} n≡m with compare n m ... | equal _ = refl ... | less n k = ⊥-elim (n≢sn+m n k n≡m) ... | greater m k = ⊥-elim (n≢sn+m m k (sym n≡m)) _ : Goal _ = Fin-inj

All in all, the proof is about 36 lines, which is pretty short for what it does.

]]>
Part 1 of a 2-part series on Random Access Lists

Tags: Agda

“Heterogeneous Random-Access Lists” by Wouter Swierstra (2019)
describes how to write a simple binary random-access list (Okasaki
1995) to use as a heterogeneous tuple. If you haven’t tried to
implement the data structure described in the paper before, you might
not realise the just how *elegant* the implementation is. The
truth is that arriving at the definitions presented is difficult: behind
every simple function is a litany of complex and ugly alternatives that
had to be tried and discarded first before settling on the final
answer.

In this post I want to go through a very similar structure, with special focus on the “wrong turns” in implementation which can lead to headache.

Here are a couple of important identities on ℕ:

+0 : ∀ n → n + zero ≡ n +0 zero = refl +0 (suc n) = cong suc (+0 n) +-suc : ∀ n m → n + suc m ≡ suc n + m +-suc zero m = refl +-suc (suc n) m = cong suc (+-suc n m)

These two show up all the time as proof obligations from the compiler
(i.e. “couldn’t match type `n + suc m`

with
`suc n + m`

”). The solution is obvious, right?
`subst`

in one of the proofs above and you’re on your way.
Wait! There might be a better way.

We’re going to look at reversing a vector as an example. We have a normal-looking length-indexed vector:

infixr 5 _∷_ data Vec (A : Set a) : ℕ → Set a where [] : Vec A zero _∷_ : A → Vec A n → Vec A (suc n)

Reversing a list is easy: we do it the standard way, in $\mathcal{O}(n)$ time, with an accumulator:

list-reverse : List A → List A list-reverse = go [] where go : List A → List A → List A go acc [] = acc go acc (x ∷ xs) = go (x ∷ acc) xs

Transferring over to a vector and we see our friends
`+-suc`

and `+0`

.

vec-reverse₁ : Vec A n → Vec A n vec-reverse₁ xs = subst (Vec _) (+0 _) (go [] xs) where go : Vec A n → Vec A m → Vec A (m + n) go acc [] = acc go acc (x ∷ xs) = subst (Vec _) (+-suc _ _) (go (x ∷ acc) xs)

The solution, as with so many things, is to use a fold instead of explicit recursion. Folds on vectors are a little more aggressively typed than those on lists:

vec-foldr : (B : ℕ → Type b) → (∀ {n} → A → B n → B (suc n)) → B zero → Vec A n → B n vec-foldr B f b [] = b vec-foldr B f b (x ∷ xs) = f x (vec-foldr B f b xs)

We allow the output type to be indexed by the list of the vector.
This is a good thing, bear in mind: we need that extra information to
properly type `reverse`

.

For reverse, unfortunately, we need a *left*-leaning fold,
which is a little trickier to implement than `vec-foldr`

.

vec-foldl : (B : ℕ → Set b) → (∀ {n} → B n → A → B (suc n)) → B zero → Vec A n → B n vec-foldl B f b [] = b vec-foldl B f b (x ∷ xs) = vec-foldl (B ∘ suc) f (f b x) xs

With this we can finally `reverse`

.

vec-reverse : Vec A n → Vec A n vec-reverse = vec-foldl (Vec _) (λ xs x → x ∷ xs) []

The real trick in this function is that the type of the return value
changes as we fold. If you think about it, it’s the same optimisation
that we make for the
$\mathcal{O}(n)$
reverse on lists: the `B`

type above is the “difference list”
in types, allowing us to append on to the end without
$\mathcal{O}(n^2)$
proofs.

As an aside, this same trick can let us type the convolve-TABA (Danvy and Goldberg 2005; Foner 2016) function quite simply:

convolve : Vec A n → Vec B n → Vec (A × B) n convolve = vec-foldl (λ n → Vec _ n → Vec _ n) (λ { k x (y ∷ ys) → (x , y) ∷ k ys}) (λ _ → [])

Binary numbers come up a lot in dependently-typed programming languages: they offer an alternative representation of ℕ that’s tolerably efficient (well, depending on who’s doing the tolerating). In contrast to the Peano numbers, though, there are a huge number of ways to implement them.

I’m going to recommend one particular implementation over the others, but before I do I want to define a function on ℕ:

2* : ℕ → ℕ 2* zero = zero 2* (suc n) = suc (suc (2* n))

In all of the implementations of binary numbers we’ll need a function
like this. It is absolutely crucial that it is defined in the way above:
the other obvious definition (`2* n = n + n`

) is a nightmare
for proofs.

Right, now on to some actual binary numbers. The obvious way (a list of bits) is insufficient, as it allows multiple representations of the same number (because of the trailing zeroes). Picking a more clever implementation is tricky, though. One way splits it into two types:

module OneTerminated where infixl 5 _0ᵇ _1ᵇ infixr 4 𝕓_ data 𝔹⁺ : Set where 1ᵇ : 𝔹⁺ _0ᵇ _1ᵇ : 𝔹⁺ → 𝔹⁺ data 𝔹 : Set where 𝕓0ᵇ : 𝔹 𝕓_ : 𝔹⁺ → 𝔹

𝔹⁺ is the strictly positive natural numbers (i.e. the naturals starting from 1). 𝔹 adds a zero to that set. This removes the possibility for trailing zeroes, thereby making this representation unique for every natural number.

⟦_⇓⟧⁺ : 𝔹⁺ → ℕ ⟦ 1ᵇ ⇓⟧⁺ = 1 ⟦ x 0ᵇ ⇓⟧⁺ = 2* ⟦ x ⇓⟧⁺ ⟦ x 1ᵇ ⇓⟧⁺ = suc (2* ⟦ x ⇓⟧⁺) ⟦_⇓⟧ : 𝔹 → ℕ ⟦ 𝕓0ᵇ ⇓⟧ = 0 ⟦ 𝕓 x ⇓⟧ = ⟦ x ⇓⟧⁺

The odd syntax lets us write binary numbers in the natural way:

_ : ⟦ 𝕓 1ᵇ 0ᵇ 1ᵇ ⇓⟧ ≡ 5 _ = refl _ : ⟦ 𝕓 1ᵇ 0ᵇ 0ᵇ 1ᵇ ⇓⟧ ≡ 9 _ = refl

I would actually recommend this representation for most use-cases, especially when you’re using binary numbers “as binary numbers”, rather than as an abstract type for faster computation.

Another clever representation is one I wrote about before: the “gapless” representation. This is far too much trouble for what it’s worth.

Finally, my favourite representation at the moment is
*zeroless*. It has a unique representation for each number, just
like the two above, but it is still a list of bits. The difference is
that the bits here are 1 and 2, not 0 and 1. I like to reuse types in
combination with pattern synonyms (rather than defining new types), as
it can often make parallels between different functions clearer.

Bit : Set Bit = Bool pattern 1ᵇ = false pattern 2ᵇ = true 𝔹 : Set 𝔹 = List Bit

Functions like `inc`

are not difficult to implement:

inc : 𝔹 → 𝔹 inc [] = 1ᵇ ∷ [] inc (1ᵇ ∷ xs) = 2ᵇ ∷ xs inc (2ᵇ ∷ xs) = 1ᵇ ∷ inc xs

And evaluation:

_∷⇓_ : Bit → ℕ → ℕ 1ᵇ ∷⇓ xs = suc (2* xs) 2ᵇ ∷⇓ xs = suc (suc (2* xs)) ⟦_⇓⟧ : 𝔹 → ℕ ⟦_⇓⟧ = foldr _∷⇓_ zero

Since we’re working in Cubical Agda, we might as well go on and prove that 𝔹 is isomorphic to ℕ. I’ll include the proof here for completeness, but it’s not relevant to the rest of the post (although it is very short, as a consequence of the simple definitions).

⟦_⇑⟧ : ℕ → 𝔹 ⟦ zero ⇑⟧ = [] ⟦ suc n ⇑⟧ = inc ⟦ n ⇑⟧ 2*⇔1ᵇ∷ : ∀ n → inc ⟦ 2* n ⇑⟧ ≡ 1ᵇ ∷ ⟦ n ⇑⟧ 2*⇔1ᵇ∷ zero = refl 2*⇔1ᵇ∷ (suc n) = cong (inc ∘ inc) (2*⇔1ᵇ∷ n) 𝔹→ℕ→𝔹 : ∀ n → ⟦ ⟦ n ⇓⟧ ⇑⟧ ≡ n 𝔹→ℕ→𝔹 [] = refl 𝔹→ℕ→𝔹 (1ᵇ ∷ xs) = 2*⇔1ᵇ∷ ⟦ xs ⇓⟧ ; cong (1ᵇ ∷_) (𝔹→ℕ→𝔹 xs) 𝔹→ℕ→𝔹 (2ᵇ ∷ xs) = cong inc (2*⇔1ᵇ∷ ⟦ xs ⇓⟧) ; cong (2ᵇ ∷_) (𝔹→ℕ→𝔹 xs) inc⇔suc : ∀ n → ⟦ inc n ⇓⟧ ≡ suc ⟦ n ⇓⟧ inc⇔suc [] = refl inc⇔suc (1ᵇ ∷ xs) = refl inc⇔suc (2ᵇ ∷ xs) = cong (suc ∘ 2*) (inc⇔suc xs) ℕ→𝔹→ℕ : ∀ n → ⟦ ⟦ n ⇑⟧ ⇓⟧ ≡ n ℕ→𝔹→ℕ zero = refl ℕ→𝔹→ℕ (suc n) = inc⇔suc ⟦ n ⇑⟧ ; cong suc (ℕ→𝔹→ℕ n) 𝔹⇔ℕ : 𝔹 ⇔ ℕ 𝔹⇔ℕ = iso ⟦_⇓⟧ ⟦_⇑⟧ ℕ→𝔹→ℕ 𝔹→ℕ→𝔹

Now on to the data structure. Here’s its type.

infixr 5 _1∷_ _2∷_ data Array (T : ℕ → Type a) : 𝔹 → Type a where [] : Array T [] _∷_ : T (bool 0 1 d) → Array (T ∘ suc) ds → Array T (d ∷ ds) pattern _1∷_ x xs = _∷_ {d = 1ᵇ} x xs pattern _2∷_ x xs = _∷_ {d = 2ᵇ} x xs

So it is a list-like structure, which contains elements of type
`T`

. `T`

is the type of trees in the array: making
the array generic over the types of trees is a slight departure from the
norm. Usually, we would just use a perfect tree or something:

module Prelim where Perfect : Set a → ℕ → Set a Perfect A zero = A Perfect A (suc n) = Perfect (A × A) n

By making the tree type a parameter, though, we actually
*simplify* some of the code for manipulating the tree. It’s
basically the same trick as the type-changing parameter in
`vec-foldl`

.

As well as that, of course, we can use the array with more exotic tree types. With binomial trees, for example, we get a binomial heap:

mutual data BinomNode (A : Set a) : ℕ → Set a where binom-leaf : BinomNode A 0 binom-branch : Binomial A n → BinomNode A n → BinomNode A (suc n) Binomial : Set a → ℕ → Set a Binomial A n = A × BinomNode A n

But we’ll stick to the random-access lists for now.

The perfect trees above are actually a specific instance of a more general data type: exponentiations of functors.

_^_ : (Set a → Set a) → ℕ → Set a → Set a (F ^ zero ) A = A (F ^ suc n) A = (F ^ n) (F A) Nest : (Set a → Set a) → Set a → ℕ → Set a Nest F A n = (F ^ n) A Pair : Set a → Set a Pair A = A × A Perfect : Set a → ℕ → Set a Perfect = Nest Pair

It’s a nested datatype, built in a bottom-up way. This is in contrast to, say, the binomial trees above, which are top-down.

Our first function on the array is `cons`

, which inserts
an element:

cons : (∀ n → T n → T n → T (suc n)) → T 0 → Array T ds → Array T (inc ds) cons branch x [] = x 1∷ [] cons branch x (y 1∷ ys) = branch 0 x y 2∷ ys cons branch x (y 2∷ ys) = x 1∷ cons (branch ∘ suc) y ys

Since we’re generic over the type of trees, we need to pass in the “branch” constructor (or function) for whatever tree type we end up using. Here’s how we’d implement such a branch function for perfect trees.

perf-branch : ∀ n → Perfect A n → Perfect A n → Perfect A (suc n) perf-branch zero = _,_ perf-branch (suc n) = perf-branch n

One issue here is that the `perf-branch`

function probably
doesn’t optimise to the correct complexity, because the `n`

has to be scrutinised repeatedly. The alternative is to define a
`cons`

for nested types, like so:

nest-cons : (∀ {A} → A → A → F A) → A → Array (Nest F A) ds → Array (Nest F A) (inc ds) nest-cons _∙_ x [] = x ∷ [] nest-cons _∙_ x (y 1∷ ys) = (x ∙ y) 2∷ ys nest-cons _∙_ x (y 2∷ ys) = x ∷ nest-cons _∙_ y ys perf-cons : A → Array (Perfect A) ds → Array (Perfect A) (inc ds) perf-cons = nest-cons _,_

Again, we’re going to keep things general, allowing multiple index
types. For those index types we’ll need a type like `Fin`

but
for binary numbers.

data Fin𝔹 (A : Set a) : 𝔹 → Type a where here₁ : Fin𝔹 A (1ᵇ ∷ ds) here₂ : (i : A) → Fin𝔹 A (2ᵇ ∷ ds) there : (i : A) → Fin𝔹 A ds → Fin𝔹 A (d ∷ ds) lookup : (∀ {n} → P → T (suc n) → T n) → Array T ds → Fin𝔹 P ds → T 0 lookup ind (x ∷ xs) here₁ = x lookup ind (x ∷ xs) (here₂ i) = ind i x lookup ind (x ∷ xs) (there i is) = ind i (lookup ind xs is) nest-lookup : (∀ {A} → P → F A → A) → Array (Nest F A) ds → Fin𝔹 P ds → A nest-lookup ind (x ∷ xs) here₁ = x nest-lookup ind (x ∷ xs) (here₂ i) = ind i x nest-lookup ind (x ∷ xs) (there i is) = ind i (nest-lookup ind xs is)

We’ll once more use perfect to show how these generic functions can
be concretised. For the index types into a perfect tree, we will use a
`Bool`

.

perf-lookup : Array (Perfect A) ds → Fin𝔹 Bool ds → A perf-lookup = nest-lookup (bool fst snd)

This next function is quite difficult to get right: a fold. We want
to consume the binary array into a unary, cons-list type thing.
Similarly to `foldl`

on vectors, we will need to change the
return type as we fold, but we will *also* need to convert from
binary to unary, *as we fold*. The key ingredient is the
following function:

2^_*_ : ℕ → ℕ → ℕ 2^ zero * n = n 2^ suc m * n = 2* (2^ m * n)

It will let us do the type-change-as-you-go trick from
`foldl`

, but in a binary setting. Here’s
`foldr`

:

array-foldr : (B : ℕ → Type b) → (∀ n {m} → T n → B (2^ n * m) → B (2^ n * suc m)) → B 0 → Array T ds → B ⟦ ds ⇓⟧ array-foldr B c b [] = b array-foldr B c b (x 1∷ xs) = c 0 x (array-foldr (B ∘ 2*) (c ∘ suc) b xs) array-foldr B c b (x 2∷ xs) = c 1 x (array-foldr (B ∘ 2*) (c ∘ suc) b xs)

And, as you should expect, here’s how to use this in combination with the perfect trees. Here we’ll build a binary random access list from a vector, and convert back to a vector.

perf-foldr : (B : ℕ → Type b) → (∀ {n} → A → B n → B (suc n)) → ∀ n {m} → Perfect A n → B (2^ n * m) → B (2^ n * suc m) perf-foldr B f zero = f perf-foldr B f (suc n) = perf-foldr (B ∘ 2*) (λ { (x , y) zs → f x (f y zs) }) n toVec : Array (Perfect A) ds → Vec A ⟦ ds ⇓⟧ toVec = array-foldr (Vec _) (perf-foldr (Vec _) _∷_) [] fromVec : Vec A n → Array (Perfect A) ⟦ n ⇑⟧ fromVec = vec-foldr (Array (Perfect _) ∘ ⟦_⇑⟧) perf-cons []

That’s the end of the “simple” stuff! The binary random-access list I’ve presented above is about as simple as I can get it.

In this section, I want to look at some more complex (and more fun) things you can do with it. First: lenses.

Lenses aren’t super ergonomic in dependently-typed languages, but they do come with some advantages. The lens laws are quite strong, for instance, meaning that often by constructing programs using a lot of lenses gives us certain properties “for free”. Here, for instance, we can define the lenses for indexing.

open import Lenses

head : Lens (Array T (d ∷ ds)) (T (bool 0 1 d)) head .into (x ∷ _ ) .get = x head .into (_ ∷ xs) .set x = x ∷ xs head .get-set (_ ∷ _) _ = refl head .set-get (_ ∷ _) = refl head .set-set (_ ∷ _) _ _ = refl tail : Lens (Array T (d ∷ ds)) (Array (T ∘ suc) ds) tail .into (_ ∷ xs) .get = xs tail .into (x ∷ _ ) .set xs = x ∷ xs tail .get-set (_ ∷ _) _ = refl tail .set-get (_ ∷ _) = refl tail .set-set (_ ∷ _) _ _ = refl

nest-lens : (∀ {A} → P → Lens (F A) A) → Fin𝔹 P ds → Lens (Array (Nest F A) ds) A nest-lens ln here₁ = head nest-lens ln (here₂ i) = head ⋯ ln i nest-lens ln (there i is) = tail ⋯ nest-lens ln is ⋯ ln i

ind-lens : (∀ {n} → P → Lens (T (suc n)) (T n)) → Fin𝔹 P ds → Lens (Array T ds) (T 0) ind-lens ln here₁ = head ind-lens ln (here₂ i) = head ⋯ ln i ind-lens ln (there i is) = tail ⋯ ind-lens ln is ⋯ ln i

Finally, to demonstrate some of the versatility of this data
structure, we’re going to implement a tree based on a *Fenwick*
tree. This is a data structure for prefix sums: we can query the running
total at any point, and *update* the value at a given point, in
$\mathcal{O}(\log n)$
time. We’re going to make it generic over a monoid:

module _ {ℓ} (mon : Monoid ℓ) where open Monoid mon record Leaf : Set ℓ where constructor leaf field val : 𝑆 open Leaf mutual SumNode : ℕ → Set ℓ SumNode zero = Leaf SumNode (suc n) = Summary n × Summary n Summary : ℕ → Set ℓ Summary n = Σ 𝑆 (fiber (cmb n)) cmb : ∀ n → SumNode n → 𝑆 cmb zero = val cmb (suc _) (x , y) = fst x ∙ fst y Fenwick : 𝔹 → Set ℓ Fenwick = Array Summary

So it’s an array of perfect trees, with each branch in the tree containing a summary of its children. Constructing a tree is straightforward:

comb : ∀ n → Summary n → Summary n → Summary (suc n) comb n xs ys = _ , (xs , ys) , refl sing : 𝑆 → Summary 0 sing x = _ , leaf x , refl fFromVec : Vec 𝑆 n → Fenwick ⟦ n ⇑⟧ fFromVec = vec-foldr (Fenwick ∘ ⟦_⇑⟧) (cons comb ∘ sing) []

Updating a particular point involves a good bit of boilerplate, but isn’t too complex.

upd-lens : Bool → Lens (Summary (suc n)) (Summary n) upd-lens b .into (_ , xs , _) .get = ⦅pair⦆ b .into xs .get upd-lens b .into (_ , xs , _) .set x = _ , ⦅pair⦆ b .into xs .set x , refl upd-lens b .get-set _ = ⦅pair⦆ b .get-set _ upd-lens false .set-get (t , xs , p) i .fst = p i upd-lens false .set-get (t , xs , p) i .snd .fst = xs upd-lens false .set-get (t , xs , p) i .snd .snd j = p (i ∧ j) upd-lens true .set-get (t , xs , p) i .fst = p i upd-lens true .set-get (t , xs , p) i .snd .fst = xs upd-lens true .set-get (t , xs , p) i .snd .snd j = p (i ∧ j) upd-lens false .set-set _ _ _ = refl upd-lens true .set-set _ _ _ = refl top : Lens (Summary 0) 𝑆 top .into x .get = x .snd .fst .val top .into x .set y .fst = y top .into x .set y .snd .fst .val = y top .into x .set y .snd .snd = refl top .get-set _ _ = refl top .set-get (x , y , p) i .fst = p i top .set-get (x , y , p) i .snd .fst = y top .set-get (x , y , p) i .snd .snd j = p (i ∧ j) top .set-set _ _ _ = refl

update : Fin𝔹 Bool ds → Lens (Fenwick ds) 𝑆 update is = ind-lens upd-lens is ⋯ top

Finally, here’s how we get the summary up to a particular point in $\mathcal{O}(\log n)$ time:

running : (∀ n → Bool → T (suc n) → 𝑆 × T n) → (∀ n → T n → 𝑆) → Array T ds → Fin𝔹 Bool ds → 𝑆 × T 0 running l s (x ∷ xs) (there i is) = let y , ys = running (l ∘ suc) (s ∘ suc) xs is z , zs = l _ i ys in s _ x ∙ y ∙ z , zs running l s (x 1∷ xs) here₁ = ε , x running l s (x 2∷ xs) (here₂ i) = l _ i x prefix : Fenwick ds → Fin𝔹 Bool ds → 𝑆 prefix xs is = let ys , zs , _ = running ind (λ _ → fst) xs is in ys ∙ zs where ind : ∀ n → Bool → Summary (suc n) → 𝑆 × Summary n ind n false (_ , (xs , _) , _) = ε , xs ind n true (_ , ((x , _) , (y , ys)) , _) = x , (y , ys)

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back
Again.” *BRICS Report Series* 12 (3). doi:10.7146/brics.v12i3.21869.

Foner, Kenneth. 2016. “’There and Back
Again’ and What Happened After.” New
York.

Okasaki, Chris. 1995. “Purely Functional
Random-access Lists.” In *Proceedings of
the Seventh International Conference on Functional
Programming Languages and Computer
Architecture*, 86–95. FPCA ’95. New York,
NY, USA: ACM. doi:10.1145/224164.224187.

Swierstra, Wouter. 2019. “Heterogeneous random-access
lists.”

Tags: Haskell

Update 5/10/2019: check the bottom of this post for some links to comments and discussion.

Beginners to Haskell are often confused as to what’s so great about the language. Much of the proselytizing online focuses on pretty abstract (and often poorly defined) concepts like “purity”, “strong types”, and (god forbid) “monads”. These things are difficult to understand, somewhat controversial, and not obviously beneficial (especially when you’ve only been using the language for a short amount of time).

The real tragedy is that Haskell (and other ML-family languages) are
*packed* with simple, decades-old features like pattern matching
and algebraic data types which have massive, clear benefits and few (if
any) downsides. Some of these ideas are finally filtering in to
mainstream languages (like Swift and Rust) where they’re used to great
effect, but the vast majority of programmers out there haven’t yet been
exposed to them.

This post aims to demonstrate some of these features in a simple (but hopefully not too simple) example. I’m going to write and package up a simple sorting algorithm in both Haskell and Python, and compare the code in each. I’m choosing Python because I like it and beginners like it, but also because it’s missing most of the features I’ll be demonstrating. It’s important to note I’m not comparing Haskell and Python as languages: the Python code is just there as a reference for people less familiar with Haskell. What’s more, the comparison is unfair, as the example deliberately plays to Haskell’s strengths (so I can show off the features I’m interested in): it wouldn’t be difficult to pick an example that makes Python look good and Haskell look poor.

This post is not meant to say “Haskell is great, and your language sucks”! It’s not even really about Haskell: much of what I’m talking about here applies equally well to Ocaml, Rust, etc. I’m really writing this as a response to the notion that functional features are somehow experimental, overly complex, or ultimately compromised. As a result of that idea, I feel like these features are left out of a lot of modern languages which would benefit from them. There exists a small set of simple, battle-tested PL ideas, which have been used for nearly forty years now: this post aims to demonstrate them, and argue for their inclusion in every general-purpose programming language that’s being designed today.

We’ll be using a skew heap to sort lists in both languages. The basic idea is to repeatedly insert stuff into the heap, and then repeatedly “pop” the smallest element from the heap until it’s empty. It’s not in-place, but it is $\mathcal{O}(n \log n)$, and actually performs pretty well in practice.

A Skew Heap is represented by a *binary tree*:

```
data Tree a
= Leaf
| Node a (Tree a) (Tree a)
```

```
class Tree:
def __init__(self, is_node, data, lchild, rchild):
self._is_node = is_node
self._data = data
self._lchild = lchild
self._rchild = rchild
def leaf():
return Tree(False, None, None, None)
def node(data, lchild, rchild):
return Tree(True, data, lchild, rchild)
```

I want to point out the precision of the Haskell definition: a tree is either a leaf (an empty tree), or a node, with a payload and two children. There are no special cases, and it took us one line to write (spread to 3 here for legibility on smaller screens).

In Python, we have to write a few more lines^{1}.
This representation uses the `_is_node`

field is
`False`

for an empty tree (a leaf). If it’s
`True`

, the other fields are filled. We write some helper
functions to give us constructors like the leaf and node ones for the
Haskell example.

This isn’t the standard definition of a binary tree in Python, in fact it might looks a little weird to most Python people. Let’s run through some alternatives and their issues.

The standard definition:

`class Tree: def __init__(self, data, lchild, rchild): self._data = data self._lchild = lchild self._rchild = rchild`

Instead of having a separate field for “is this a leaf or a node”, the empty tree is simply

`None`

:`def leaf(): return None`

With this approach, if we define any

*methods*on a tree, they won’t work on the empty tree!`>>> leaf().size() AttributeError: 'NoneType' object has no attribute 'size'`

We’ll do inheritance! Python even has a handy

`abc`

library to help us with some of this:`from abc import ABC, abstractmethod class Tree(ABC): @abstractmethod def size(self): raise NotImplemented class Leaf(Tree): def __init__(self): pass def size(self): return 0 class Node(Tree): def __init__(self, data, lchild, rchild): self._data = data self._lchild = lchild self._rchild = rchild def size(self): return 1 + self._lchild.size() + self._rchild.size()`

Methods will now work on an empty tree, but we’re faced with 2 problems: first, this is very verbose, and pretty complex. Secondly, we can’t write a mutating method which changes a tree from a leaf to a node. In other words, we can’t write an

`insert`

method!We won’t represent a leaf as the whole

*tree*being`None`

, just the data!`def leaf(): return Tree(None, None, None)`

This (surprisingly) pops up in a few places. While it solves the problem of methods, and the mutation problem, it has a serious bug. We can’t have

`None`

as an element in the tree! In other words, if we ask our eventual algorithm to sort a list which contains`None`

, it will silently discard some of the list, returning the wrong answer.

There are yet more options (using a wrapper class), none of them ideal. Another thing to point out is that, even with our definition with a tag, we can only represent types with 2 possible states. If there was another type of node in the tree, we couldn’t simply use a boolean tag: we’d have to switch to integers (and remember the meaning of each integer), or strings! Yuck!

What Python is fundamentally missing here is *algebraic data
types*. This is a way of building up all of your types out of
products (“my type has this *and* this”) and sums (“my type is
this *or* this”). Python can do products perfectly well: that’s
what classes are. The tree class itself is the product of
`Bool`

, `data`

, `Tree`

, and
`Tree`

. However it’s missing an *entire half of the
equation*! This is why you just can’t express binary trees as
cleanly as you can in Swift, Haskell, OCaml, etc. Python, as well as a
host of other languages like Go, Java, etc, will let you express
*one* kind of “sum” type: “or nothing” (the null pointer).
However, it’s clunky and poorly handled in all of those languages (the
method problems above demonstrate the issues in Python), and doesn’t
work for anything other than that one special case.

Again, there’s nothing about algebraic data types that makes them ill-suited to mainstream or imperative languages. Swift uses them, and people love them!

The core operation on skew heaps is the *skew merge*.

```
merge :: Ord a => Tree a -> Tree a -> Tree a
Leaf ys = ys
merge Leaf = xs
merge xs @(Node x xl xr) ys@(Node y yl yr)
merge xs| x <= y = Node x (merge ys xr) xl
| otherwise = Node y (merge xs yr) yl
```

```
def merge(lhs, rhs):
if not lhs._is_node:
return rhs
if not rhs._is_node:
return lhs
if lhs._data <= rhs._data:
return Tree(lhs._data,
merge(rhs, lhs._rchild),
lhs._lchild)else:
return Tree(rhs._data,
merge(lhs, rhs._rchild), rhs._lchild)
```

The standout feature here is pattern matching. In Haskell, we’re able to write the function as we might describe it: “in this case, I’ll do this, in this other case, I’ll do this, etc.”. In Python, we are forced to think of the truth tables and sequential testing. What do I mean by truth tables? Consider the following version of the Python function above:

```
def merge(lhs, rhs):
if lhs._is_node:
if rhs._is_node:
if lhs._data <= rhs._data:
return Tree(lhs._data,
merge(rhs, lhs._rchild),
lhs._lchild)else:
return Tree(rhs._data,
merge(lhs, rhs._rchild),
rhs._lchild)else:
return lhs
else:
return rhs
```

You may even write this version first: it initially seems more
natural (because `_is_node`

is used in the positive). Here’s
the question, though: does it do the same thing as the previous version?
Are you *sure*? Which else is connected to which if? Does every
if have an else? (some linters will suggest you *remove* some of
the elses above, since the if-clause has a `return`

statement
in it!)

The fact of the matter is that we are forced to do truth tables of
every condition in our minds, rather than *saying what we mean*
(as we do in the Haskell version).

The other thing we’re saved from in the Haskell version is accessing
undefined fields. In the Python function, we know accessing
`lhs._data`

is correct since we verified that
`lhs`

is a node. But the logic to do this verification is
complex: we checked if it *wasn’t* a node, and returned if that
was true… so if it *is true* that `lhs`

*isn’t*
a node, we would have returned, but we didn’t, so…

Bear in mind all of these logic checks happened four lines before the
actual access: this can get much uglier in practice! Compare this to the
Haskell version: *we only get to bind variables if we’re sure they
exist*. The syntax itself prevents us from accessing fields which
aren’t defined, in a simple way.

Pattern matching has existed for years in many different forms: even C has switch statements. The added feature of destructuring is available in languages like Swift, Rust, and the whole ML family. Ask for it in your language today!

Now that we have that function, we get to define others in terms of it:

```
insert :: Ord a => a -> Tree a -> Tree a
= merge (Node x Leaf Leaf) insert x
```

```
def insert(element, tree):
= merge(
tree.__dict__
node(element, leaf(), leaf()),
tree ).__dict__.copy()
```

I haven’t mentioned Haskell’s type system so far, as it’s been quite
unobtrusive in the examples. And that’s kind of the point: despite more
complex examples you’ll see online demonstrating the power of type
classes and higher-kinded types, Haskell’s type system *excels*
in these simpler cases.

`merge :: Ord a => Tree a -> Tree a -> Tree a`

Without much ceremony, this signature tells us:

- The function takes two trees, and returns a third.
- Both trees have to be filled with the same types of elements.
- Those elements must have an order defined on them.

I feel a lot of people miss the point of this particular feature. Technically speaking, this feature allows us to write fewer type signatures, as Haskell will be able to guess most of them. Coming from something like Java, you might think that that’s an opportunity to shorten up some verbose code. It’s not! You’ll rarely find a Haskell program these days missing top-level type signatures: it’s easier to read a program with explicit type signatures, so people are advised to put them as much as possible.

(Amusingly, I often find older Haskell code snippets which are entirely devoid of type signatures. It seems that programmers were so excited about Hindley-Milner type inference that they would put it to the test as often as they could.)

Type inference in Haskell is actually useful in a different way.
First, if I write the *implementation* of the `merge`

function, the compiler will tell *me* the signature, which is
extremely helpful for more complex examples. Take the following, for
instance:

`= ((x * 2) ^ 3) / 4 f x `

Remembering precisely which numeric type `x`

needs to be
is a little difficult (`Floating`

? `Real`

?
`Fractional`

?), but if I just ask the compiler it will tell
me without difficulty.

The second use is kind of the opposite: if I have a hole in my
program where I need to fill in some code, Haskell can help me along by
telling me the *type* of that hole automatically. This is often
enough information to figure out the entire implementation! In fact,
there are some programs which will use this capability of the type
checker to fill in the hole with valid programs, synthesising your code
for you.

So often strong type systems can make you feel like you’re fighting more and more against the compiler. I hope these couple examples show that it doesn’t have to be that way.

The next function is “pop-min”:

```
popMin :: Ord a => Tree a -> Maybe (a, Tree a)
Leaf = Nothing
popMin Node x xl xr) = Just (x, merge xl xr) popMin (
```

```
def popMin(tree):
if tree._is_node:
= tree._data
res = merge(
tree.__dict__
tree._lchild,
tree._rchild
).__dict__.copy()return res
else:
raise IndexError
```

At first glance, this function should be right at home in Python. It
*mutates* its input, and it has an error case. The code we’ve
written here for Python is pretty idiomatic, also: other than the ugly
deep copy, we’re basically just mutating the object, and using an
exception for the exceptional state (when the tree is empty). Even the
exception we use is the same exception as when you try and
`pop()`

from an empty list.

The Haskell code here mainly demonstrates a difference in API style
you’ll see between the two languages. If something isn’t found, we just
use `Maybe`

. And instead of mutating the original variable,
we return the new state in the second part of a tuple. What’s nice about
this is that we’re only using simple core features like algebraic data
types to emulate pretty complex features like exceptions in Python.

You may have heard that “Haskell uses monads to do mutation and exceptions”. This is not true. Yes, state and exceptions have patterns which technically speaking are “monadic”. But make no mistake: when we want to model “exceptions” in Haskell, we really just return a maybe (or an either). And when we want to do “mutation”, we return a tuple, where the second element is the updated state. You don’t have to understand monads to use them, and you certainly don’t “need” monads to do them. To drive the point home, the above code could actually equivalently have a type which mentions “the state monad” and “the maybe monad”:

`popMin :: Ord a => StateT (Tree a) Maybe a`

But there’s no need to!

The main part of our task is now done: all that is left is to glue
the various bits and pieces together. Remember, the overall algorithm
builds up the heap from a list, and then tears it down using
`popMin`

. First, then, to build up the heap.

```
listToHeap :: Ord a => [a] -> Tree a
= foldr insert Leaf listToHeap
```

```
def listToHeap(elements):
= leaf()
res for el in elements:
insert(el, res)return res
```

To my eye, the Haskell code here is significantly more “readable”
than the Python. I know that’s a very subjective judgement, but
`foldr`

is a function so often used that it’s immediately
clear what’s happening in this example.

Why didn’t we use a similar function in Python, then? We actually
could have: python does have an equivalent to `foldr`

, called
`reduce`

(it’s been
relegated to functools since Python 3 (also technically it’s
equivalent to `foldl`

, not `foldr`

)). We’re
encouraged *not* to use it, though: the more pythonic code uses a
for loop. Also, it wouldn’t work for our use case: the
`insert`

function we wrote is *mutating*, which
doesn’t gel well with `reduce`

.

I think this demonstrates another benefit of simple, functional APIs.
If you keep things simple, and build things out of functions, they’ll
tend to glue together *well*, without having to write any glue
code yourself. The for loop, in my opinion, is “glue code”. The next
function, `heapToList`

, illustrates this even more so:

```
heapToList :: Ord a => Tree a -> [a]
= unfoldr popMin heapToList
```

```
def heapToList(tree):
= []
res try:
while True:
res.append(popMin(tree))except IndexError:
return res
```

Again, things are kept simple in the Haskell example. We’ve stuck to
data types and functions, and these data types and functions mesh well
with each other. You might be aware that there’s some deep and
interesting mathematics behind the `foldr`

and
`unfoldr`

functions going on, and how
they relate. We don’t need to know any of that here, though: they
just work together well.

Again, Python does have a function which is equivalent to
`unfoldr`

: `iter`

has an overload which will repeatedly call a function until it hits a
sentinel value. But this doesn’t fit with the rest of the iterator
model! Most iterators are terminated with the `StopIteration`

exception; ours (like the `pop`

function on lists) is
terminated by the `IndexError`

exception; and this function
excepts a third version, terminated by a sentinel!

Finally, let’s write `sort`

:

```
sort :: Ord a => [a] -> [a]
sort = heapToList . listToHeap
```

```
def sort(elements):
return heapToList(listToHeap(elements))
```

This is just driving home the point: programs work *well* when
they’re built out of functions, and you *want* your language to
encourage you to build things out of functions. In this case, the
`sort`

function is built out of two smaller ones: it’s the
*essence* of function composition.

So I fully admit that laziness is one of the features of Haskell that does have downsides. I don’t think every language should be lazy, but I did want to say a little about it in regards to the sorting example here.

I tend to think that people overstate how hard it makes reasoning about space: it actually follows pretty straightforward rules, which you can generally step through in yourself (compared to, for instance, rewrite rules, which are often black magic!)

In modern programming, people will tend to use laziness it anyway.
Python is a great example: the itertools
library is almost entirely lazy. Actually making use of the laziness,
though, is difficult and error-prone. Above, for instance, the
`heapToList`

function is lazy in Haskell, but strict in
Python. Converting it to a lazy version is not the most difficult thing
in the world:

```
def heapToList(tree):
try:
while True:
yield popMin(tree)
except IndexError:
pass
```

But now, suddenly, the entire list API won’t work. What’s more, if we
try and access the *first* element of the returned value, we
mutate the whole thing: anyone else looking at the output of the
generator will have it mutated out from under them!

Laziness fundamentally makes this more reusable. Take our
`popMin`

function: if we just want to view the smallest
element, without reconstructing the rest of the tree, we can actually
use `popMin`

as-is. If we don’t use the second element of the
tuple we don’t pay for it. In Python, we need to write a second
function.

Testing the `sort`

function in Haskell is ridiculously
easy. Say we have an example sorting function that we trust, maybe a
slow but obvious insertion sort, and we want to make sure that our fast
heap sort here does the same thing. This is the test:

`-> sort (xs :: [Int]) === insertionSort xs) quickCheck (\xs `

In that single line, the QuickCheck library will automatically generate random input, run each sort function on it, and compare the two outputs, giving a rich diff if they don’t match.

This post was meant to show a few features like pattern-matching, algebraic data types, and function-based APIs in a good light. These ideas aren’t revolutionary any more, and plenty of languages have them, but unfortunately several languages don’t. Hopefully the example here illustrates a little why these features are good, and pushes back against the idea that algebraic data types are too complex for mainstream languages.

This got posted to /r/haskell
and hackernews. You
can find me arguing in the comments there a little bit: I’m `oisdk`

on
hackernews and u/foBrowsing on
reddit.

There are two topics that came up a bunch that I’d like to add to this post. First I’ll just quote one of the comments from Beltiras:

Friend of mine is always trying to convert me. Asked me to read this yesterday evening. This is my take on the article:

Most of my daily job goes into gluing services (API endpoints to databases or other services, some business logic in the middle). I don’t need to see yet another exposition of how to do algorithmic tasks. Haven’t seen one of those since doing my BSc. Show me the tools available to write a daemon, an http server, API endpoints, ORM-type things and you will have provided me with tools to tackle what I do. I’ll never write a binary tree or search or a linked list at work.

If you want to convince me, show me what I need to know to do what I do.

and my response:

I wasn’t really trying to convince anyone to use Haskell at their day job: I am just a college student, after all, so I would have no idea what I was talking about!

I wrote the article a while ago after being frustrated using a bunch of Go and Python at an internship. Often I really wanted simple algebraic data types and pattern-matching, but when I looked up why Go didn’t have them I saw a lot of justifications that amounted to “functional features are too complex and we’re making a simple language. Haskell is notoriously complex”. In my opinion, the

`res, err := fun(); if err != nil`

(for example) pattern was much more complex than the alternative with pattern-matching. So I wanted to write an article demonstrating that, while Haskell has a lot of out-there stuff in it, there’s a bunch of simple ideas which really shouldn’t be missing from any modern general-purpose language.

As to why I used a binary tree as the example, I thought it was pretty self-contained, and I find skew heaps quite interesting.

The second topic was basically people having a go at my ugly Python;
to which I say: fair enough! It is not my best. I wasn’t trying
necessarily to write the best Python I could here, though, rather I was
trying to write the “normal” implementation of a binary tree. If I was
to implement a binary tree of some sort myself, though, I would
certainly write it in an immutable style rather than the style here.
Bear in mind as well that much of what I’m arguing for is stylistic: I
think (for instance) that it would be better to use `reduce`

in Python more, and I think the move away from it is a bad thing. So of
course I’m not going to use reduce when I’m showing the Python version:
I’m doing a compare and contrast!

Yes, I know about the new dataclasses feature. However, it’s wrapped up with the (also new) type hints module, and as such is much more complicated to use. As the purpose of the Python code here is to provide something of a lingua franca for non-Haskellers, I decided against using it. That said, the problems outlined are

*not*solved by dataclasses.↩︎

Tags: Agda

I recently finished my undergrad degree in UCC. I’m putting my final-year project up here for reference purposes.

Here is the pdf.

And here’s a bibtext entry:

```
@thesis{kidney_automatically_2019,
address = {Cork, Ireland},
type = {Bachelor thesis},
title = {Automatically and {Efficiently} {Illustrating} {Polynomial} {Equalities} in {Agda}},
url = {https://doisinkidney.com/pdfs/bsc-thesis.pdf},
abstract = {We present a new library which automates the construction of equivalence proofs between polynomials over commutative rings and semirings in the programming language Agda [20]. It is signi cantly faster than Agda’s existing solver. We use re ection to provide a sim- ple interface to the solver, and demonstrate how to use the constructed proofs to provide step-by-step solutions.},
language = {en},
school = {University College Cork},
author = {Kidney, Donnacha Oisín},
month = apr,
year = {2019}
}
```

Tags: Python

This post is a write-up of a solution to part of a programming puzzle
I did yesterday. It’s a little different than the usual “solution +
theory” approach, though: I’m going to talk about the actual steps you’d
need to take to get to the solution (i.e. what to google, what
intermediate code looks like, etc.). Often write ups like this are
presented as finished artefacts, with little info on the tricks or
techniques the author used to jog their intuition into figuring out the
puzzle (or where some intermediate step requires a leap of insight). In
actual fact, this particular puzzle requires almost no insight *at
all*: I’m going to show how to get to a working solution without
understanding any of the theory behind it!

Spoilers ahead for the google foobar problem “Distract the Guards”.

We’re interested in a particular type of sequences of pairs of numbers. These sequences are generated from a starting pair $n$ and $m$ like so:

If $n$ and $m$ are equal, the sequence stops.

Otherwise, the smaller number is subtracted from the larger, and then the smaller is doubled, and the sequence continues with these two numbers.

Here’s an example starting with 3 and 5:

```
3, 5
6, 2
4, 4
---- done ----
```

Once it hits `4, 4`

, the first condition is met, and the
sequence stops. Not all of these sequences stop, however:

```
1, 4
2, 3
1, 4
---- done ----
```

As you can see, in this case we loop back around to
`1, 4`

: our task is to figure out, given a pair of numbers,
whether the sequence generated by them loops forever, or stops at some
point.

This step is crucial: before trying to figure out any of the deep mathematics behind the problem, write the dumbest thing that could work. You’re going to need it, anyway, to test your faster versions against, and besides, it might be good enough as-is!

```
def sequence(n,m):
while n != m:
yield (n,m)
if n < m:
-= n
m *= 2
n else:
-= m
n *= 2
m
def loops(xs):
= set()
seen for x in xs:
if x in seen:
return True
else:
seen.add(x)return False
def solution(n,m):
return loops(sequence(n,m))
```

The first function actually generates the sequence we’re interested in: it uses python’s generators to do so. The second function is just a generic function that checks a sequence for duplicates. Finally, the last function answers the question we’re interested in.

Next, we want to try and spot some patterns in the answers the function generates. Remember, we’re not really interested in figuring out the theory at this point: if we find out that a loop only happens when both numbers are even (for instance), that’s good enough for us and we can stop there!

We humans are pattern-matching machines: to leverage our abilities, though, we will need to visualise the data somehow. In this case, I’m going to plot a simple scatter graph to the terminal, using the following code (I apologise for my terrible indentation style):

```
print(
'\n'.join(
''.join(
'*' if solution(x,y) else ' '
for x in range(1,81)
)for y in range(100,0,-1)
) )
```

And we get the following output:

```
*************************** ******************************* ********************
**************************** *** *********** ******************************* ***
************* *************** **************************************************
****************************** *************************************************
******************************* ************************************************
******************************** *********************** ******* ***************
********************************* **********************************************
** *************************** *** *********************************************
*********************************** ********************************************
************ ******* *************** *******************************************
***** *********************** ******* *************** *************** **********
************************************** *****************************************
*************************************** ****************************************
******** ******************* *********** ***************************************
***************************************** **************************************
****************************************** ******* *********************** *****
*********** *************** *************** ************************************
******************************************** ***********************************
********************************************* **********************************
************** *********** ******************* *************** *****************
*********************************************** *******************************
************************************************ ***************************** *
***************** ******* *********************** *************************** **
********** *********************** *************** ************************* ***
*************************************************** *********************** ****
**** *************** *** ******************* ******* ********************* *****
***************************************************** ******************* ******
****************************************************** ***************** *******
*********************** ******************************* *************** ********
******************************************************** ************* *********
********* ******************************* *************** *********** **********
********************** *** ******************************* ********* ***********
*********************************************************** ******* ************
************************************************************ ***** *************
********************* ******* ******************************* *** **************
************** *********************** *********************** * ***************
*************************************************************** ****************
******* *********** *********** *************** ************* * ***************
* *********************************************************** *** **************
** ********************************************************* ***** *************
*** *************** *************** *********************** ******* ************
**** ***************************************************** ********* ***********
***** *************************************************** *********** **********
****** *********** ******************* ***************** ************* *********
******* *********************************************** *************** ********
******** *************** ******* ********************* ***************** *******
********* ******* *********************** *********** ******************* ******
********** ***************************************** ********************* *****
*********** *************************************** *********************** ****
************ *** *************************** ***** ************************* ***
************* *************** ******************* *************************** **
****** ******* ********************************* ************* *************** *
*************** ******************************* *******************************
**************** ***************************** *********************************
***************** *************************** **********************************
** *********** *** ******* ******* ********* ***** *********************** *****
******************* *********************** ************************************
******************** ********************* *************************************
***** ******* ******* ******************* *********** *************** **********
********************** ***************** ***************************************
*********************** *************** ****************************************
******** *** *********** ************* ***************** ******* ***************
************************* *********** ******************************************
************************** ********* *******************************************
*********** *************** ******* *********************** ********************
**** *************** ******* ***** ********* ******************************* ***
***************************** *** **********************************************
********** *** *************** * ********************* ******* *****************
******************************* ************************************************
***************************** * ***********************************************
* ******* ******* *********** *** *************** *************** **************
** ************************* ***** *********************************************
*** *********************** ******* ********************************************
**** *** *********** ***** ********* ******* *********************** ***********
***** ******************* *********** *************************************** **
****** ******* ********* ************* *************** ******************* *****
******* *************** *************** ******************************* ********
******** ************* ***************** *************************** ***********
********* *********** ******************* *********************** **************
** *** *** ********* ***** ******* ******* ******************* *********** *****
*********** ******* *********************** *************** ********************
************ ***** ************************* *********** ***********************
***** ******* *** *********** *************** ******* *********************** **
************** * ***************************** *** *****************************
*************** ******************************* ********************************
*** *** ***** * ******* ******* *********** *** *************** ***************
* *********** *** *********************** ******* ******************************
** ********* ***** ******************* *********** *****************************
*** ******* ******* *************** *************** ****************************
**** ***** ********* *********** ******************* *********************** ***
***** *** *********** ******* *********************** *************** **********
** *** * ***** ******* *** *********** *************** ******* *****************
******* *************** ******************************* ************************
***** * *********** *** *********************** ******* ***********************
* *** *** ******* ******* *************** *************** **********************
** * ***** *** *********** ******* *********************** *************** *****
*** ******* *************** ******************************* ********************
* * *** *** ******* ******* *************** *************** *******************
* *** ******* *************** ******************************* ******************
* *** ******* *************** ******************************* *****************
```

There’s a clear pattern there, but it might be easier to see if we
inverted it, plotting those things which *don’t* loop:

```
* *
* * * *
* *
*
*
* * *
*
* * *
*
* * *
* * * * *
*
*
* * *
*
* * *
* * *
*
*
* * * *
* *
* *
* * * *
* * * *
* *
* * * * * *
* *
* *
* * *
* *
* * * *
* * * *
* *
* *
* * * *
* * * *
*
* * * * * * *
* * *
* * *
* * * * *
* * *
* * *
* * * * *
* * *
* * * * *
* * * * *
* * *
* * *
* * * * *
* * * *
* * * * *
* * *
* *
* *
* * * * * * * *
* *
* *
* * * * * *
* *
* *
* * * * * *
* *
* *
* * * *
* * * * * *
* *
* * * * * *
*
* * *
* * * * * * *
* * *
* * *
* * * * * * *
* * * *
* * * * * *
* * * *
* * * *
* * * *
* * * * * * * * *
* * * *
* * * *
* * * * * * *
* * * *
* *
* * * * * * * * * *
* * * * *
* * * * *
* * * * *
* * * * * *
* * * * * *
* * * * * * * * *
* * *
* * * * * * *
* * * * * * *
* * * * * * * *
* * * *
* * * * * * * * *
* * * * *
* * * * * *
```

For this kind of thing it’s also worth getting familiar with gnuplot.

The clearest pattern in the graph above is the straight lines coming
from the origin. This tells me, straight away, that we have an
opportunity for optimisation if we wanted to memoize. We can’t yet be
sure, but it *looks* like every point belongs to one of these
straight lines. That means that once we find a non-looping pair like
`3, 5`

, we can extend that line out to `6, 10`

and
`9, 15`

, etc.

We can also see that the graph has a symmetry through the line
`x = y`

. This means that if `3, 5`

doesn’t loop,
neither does `5, 3`

.

Both of these techniques allow us to reduce the arguments to a canonical form, making the memoization table smaller, also. In code:

```
from fractions import Fraction
def canonical(n,m):
= Fraction(n,m) if n <= m else Fraction(m,n)
f return (f.numerator, f.denominator)
= {}
memo_dict
def solution(n,m):
= canonical(n, m)
c try:
return memo_dict[c]
except KeyError:
= loops(sequence(*c))
r = r
memo_dict[c] return r
```

Now that we have our faster version, we want to be able to quickly check that it’s equivalent to the slow. While Python is usually great for programming puzzles, this step in particular is crying out for something like QuickCheck: without it, we’ll have to roll our own.

```
from random import randrange
for _ in range(1000):
= randrange(1,10000), randrange(1,10000)
x, y if solution_new(x,y) != solution_old(x,y):
print(x,y)
```

We’re not looking for certainty here, just something that will quickly spot an error if one exists.

Now that we’ve made some of the more obvious optimisations, it’s time to move on to finding another pattern in the output. To do this, we’ll use oeis.org. We want to find if the pairs which pass our test follow some sequence which has a simple generating function which we can adapt into a test.

Since the things we’re testing are pairs, rather than individual numbers, we’ll have to fix one of them and see if there’s a pattern in the other.

`print([x for x in range(1,101) if not solution(1, x)])`

This prints the following sequence:

`[1, 3, 7, 15, 31, 63]`

And when we search for it on oeis, we get this as the top result:

```
0, 1, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095, 8191, 16383, 32767,
65535, 131071, 262143, 524287, 1048575, 2097151, 4194303, 8388607, 16777215,
33554431, 67108863...
```

And looking at the comments under the sequence, we see the following:

Numbers n for which the expression 2^n/(n+1) is an integer. - Paolo P. Lava, May 12 2006

A test for members of the sequence, all packaged up for us!

But how do we generalise to pairs other than 1? Well, as a total guess, we can see that 1 appears in one place in the formula: why not replace that with the other member of the pair?

After that, we get the following function to test:

```
def solution(n,m):
= canonical(n,m)
nc, mc return bool((2 ** mc) % (nc + mc))
```

And it works!

This last step is pretty straightforward: see if there’s an algorithm
already out there that solves your problem. In our case, taking the
modulus is still pretty slow, but it turns out that modular
exponentiation (i.e. computing expressions of the form
`x^y mod z`

) can be done faster
than the naive way. In fact, python provides this algorithm as a
function in the standard library, making our last version of the
function the following:

```
def solution(n,m):
= canonical(n,m)
nc, mc return bool(pow(2, mc, nc + mc))
```

I’m not sure if this function is fully correct, but it was accepted as a solution to the puzzle.

Anyway, in conclusion: you can get quite far through a programming puzzle by applying some educated guesses and googling!

]]>
Part 7 of a 10-part series on Breadth-First Traversals

Tags: Haskell

Currently, we have several different ways to enumerate a tree in
breadth-first order. The typical solution (which is the usual
recommended approach in imperative programming as well) uses a
*queue*, as described by Okasaki (2000). If we
take the simplest possible queue (a list), we get a quadratic-time
algorithm, with an albeit simple implementation. The next simplest
version is to use a banker’s queue (which is just a pair of lists). From
this version, if we inline and apply identities like the following:

`foldr f b . reverse = foldl (flip f) b`

We’ll get to the following definition:

```
bfe :: Forest a -> [a]
= foldr f b ts []
bfe ts where
Node x xs) fw bw = x : fw (xs : bw)
f (
= []
b [] = foldl (foldr f) b qs [] b qs
```

We can get from this function to others (like one which uses a corecursive queue, and so on) through a similar derivation. I might some day write a post on each derivation, starting from the simple version and demonstrating how to get to the more efficient at each step.

For today, though, I’m interested in the *traversal* of a rose
tree. Traversal, here, of course, is in the applicative sense.

Thus far, I’ve managed to write linear-time traversals, but they’ve been unsatisfying. They work by enumerating the tree, traversing the effectful function over the list, and then rebuilding the tree. Since each of those steps only takes linear time, the whole thing is indeed a linear-time traversal, but I hadn’t been able to fuse away the intermediate step.

The template for the algorithm I want comes from the
`Phases`

applicative (Easterly 2019):

```
data Phases f a where
Lift :: f a -> Phases f a
(:<*>) :: f (a -> b) -> Phases f a -> Phases f b
```

We can use it to write a breadth-first traversal like so:

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (traverse go xs)) go (
```

The key component that makes this work is that it combines applicative effects in parallel:

```
instance Functor f => Functor (Phases f) where
fmap f (Lift x) = Lift (fmap f x)
fmap f (fs :<*> xs) = fmap (f.) fs :<*> xs
instance Applicative f => Applicative (Phases f) where
pure = Lift . pure
Lift fs <*> Lift xs = Lift (fs <*> xs)
:<*> gs) <*> Lift xs = liftA2 flip fs xs :<*> gs
(fs Lift fs <*> (xs :<*> ys) = liftA2 (.) fs xs :<*> ys
:<*> gs) <*> (xs :<*> ys) = liftA2 c fs xs :<*> liftA2 (,) gs ys
(fs where
~(x,y) = f x (g y) c f g
```

We’re also using the following helper functions:

```
runPhases :: Applicative f => Phases f a -> f a
Lift x) = x
runPhases (:<*> xs) = fs <*> runPhases xs
runPhases (fs
later :: Applicative f => Phases f a -> Phases f a
= (:<*>) (pure id) later
```

The problem is that it’s quadratic: the `traverse`

in:

`Node x xs) = liftA2 Node (Lift (f x)) (later (traverse go xs)) go (`

Hides some expensive calls to `<*>`

.

The problem with the `Phases`

traversal is actually
analogous to another function for enumeration: `levels`

from
Gibbons
(2015).

```
levels :: Tree a -> [[a]]
Node x xs) = [x] : foldr lzw [] (map levels xs)
levels (where
= ys
lzw [] ys = xs
lzw xs [] :xs) (y:ys) = (x ++ y) : lzw xs ys lzw (x
```

`lzw`

takes the place of `<*>`

here, but
the overall issue is the same: we’re zipping at every point, making the
whole thing quadratic.

However, from the above function we *can* derive a linear time
enumeration. It looks like this:

```
levels :: Tree a -> [[a]]
= f ts []
levels ts where
Node x xs) (q:qs) = (x:q) : foldr f qs xs
f (Node x xs) [] = [x] : foldr f [] xs f (
```

Our objective is clear, then: try to derive the linear-time
implementation of `bft`

from the quadratic, in a way
analogous to the above two functions. This is actually relatively
straightforward once the target is clear: the rest of this post is
devoted to the derivation.

First, we start off with the original `bft`

.

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (traverse go xs)) go (
```

`traverse`

.
```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (go' xs))
go (= foldr (liftA2 (:) . go) (pure []) go'
```

`go''`

.
```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (go' xs))
go (= foldr go'' (pure [])
go' Node x xs) ys = liftA2 (:) (liftA2 Node (Lift (f x)) (later (go' xs))) ys go'' (
```

Inline `go'`

(and rename `go''`

to
`go'`

)

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (foldr go' (pure []) xs))
go (Node x xs) ys = liftA2 (:) (liftA2 Node (Lift (f x)) (later (foldr go' (pure []) xs))) ys go' (
```

Definition of `liftA2`

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (foldr go' (pure []) xs))
go (Node x xs) ys = liftA2 (:) (fmap Node (f x) :<*> (foldr go' (pure []) xs)) ys go' (
```

Definition of `liftA2`

(pattern-matching on
`ys`

)

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (foldr go' (pure []) xs))
go (Node x xs) (Lift ys) = fmap (((:).) . Node) (f x) :<*> (foldr go' (pure []) xs) <*> Lift ys
go' (Node x xs) (ys :<*> zs) = fmap (((:).) . Node) (f x) :<*> (foldr go' (pure []) xs) <*> ys :<*> zs go' (
```

`<*>`

.
```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (foldr go' (pure []) xs))
go (Node x xs) (Lift ys) = liftA2 flip (fmap (((:).) . Node) (f x)) ys :<*> foldr go' (pure []) xs
go' (Node x xs) (ys :<*> zs) = liftA2 c (fmap (((:).) . Node) (f x)) ys :<*> liftA2 (,) (foldr go' (pure []) xs) zs
go' (where
~(x,y) = f x (g y) c f g
```

Fuse `liftA2`

with `fmap`

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= runPhases . go
bft f where
Node x xs) = liftA2 Node (Lift (f x)) (later (foldr go' (pure []) xs))
go (Node x xs) (Lift ys) = liftA2 (flip . (((:).) . Node)) (f x) ys :<*> foldr go' (pure []) xs
go' (Node x xs) (ys :<*> zs) = liftA2 (c . (((:).) . Node)) (f x) ys :<*> liftA2 (,) (foldr go' (pure []) xs) zs
go' (where
~(x,y) = f x (g y) c f g
```

```
bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= go
bft f where
Node x xs) = liftA2 Node (f x) (runPhases (foldr go' (pure []) xs))
go (
Node x xs) (Lift ys) = liftA2 (\y zs ys -> Node y ys : zs) (f x) ys :<*> foldr go' (pure []) xs
go' (Node x xs) (ys :<*> zs) = liftA2 c (f x) ys :<*> liftA2 (,) (foldr go' (pure []) xs) zs
go' (where
~(ys,z) = Node y ys : g z c y g
```

At this point, we actually hit a wall: the expression

`foldr go' (pure []) xs) zs liftA2 (,) (`

Is what makes the whole thing quadratic. We need to find a way to
thread that `liftA2`

along with the fold to get it to linear.
This is the only real trick in the derivation: I’ll use polymorphic
recursion to avoid the extra zip.

```
bft :: forall f a b. Applicative f => (a -> f b) -> Tree a -> f (Tree b)
= go
bft f where
Node x xs) = liftA2 (\y (ys,_) -> Node y ys) (f x) (runPhases (foldr go' (pure ([],())) xs))
go (
go' :: forall c. Tree a -> Phases f ([Tree b], c) -> Phases f ([Tree b], c)
Node x xs) ys@(Lift _) = fmap (\y -> first (pure . Node y)) (f x) :<*> foldr go' ys xs
go' (Node x xs) (ys :<*> zs) = liftA2 c (f x) ys :<*> foldr go' (fmap ((,) []) zs) xs
go' (where
~(ys,z) = first (Node y ys:) (g z) c y g
```

And that’s it!

We can finally write a slightly different version that avoids some
unnecessary `fmap`

s by basing `Phases`

on
`liftA2`

rather than `<*>`

.

```
data Levels f a where
Now :: a ->
```