Donnacha Oisín Kidney's Blog

POPL Paper—Algebraic Effects Meet Hoare Logic in Cubical Agda

Donnacha Oisín Kidney — Tue, 07 Nov 2023 00:00:00 UT

Posted on November 7, 2023

Tags: Agda

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

Lazily Grouping in Haskell

Donnacha Oisín Kidney — Mon, 17 Oct 2022 00:00:00 UT

Posted on October 17, 2022

Tags: Haskell

Here’s a cool trick:

minimum :: Ord a => [a] -> a
minimum = head . sort

This is $\mathcal{O}(n)$ in Haskell, not $\mathcal{O}(n \log n)$ as you might expect. And this isn’t because Haskell is using some weird linear-time sorting algorithm; indeed, the following is $\mathcal{O}(n \log n)$ :

maximum :: Ord a => [a] -> a
maximum = last . sort

No: since the implementation of minimum above only demands the first element of the list, and since sort has been carefully implemented, only a linear amount of work will be done to retrieve it.

It’s not easy to structure programs to have the same property as sort does above: to be maximally lazy, such that unnecessary work is not performed. Today I was working on a maximally lazy implementation of the following program:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn = ...

>>> groupOn (`rem` 2) [1..5]
[(1,[1,3,5]),(0,[2,4])]

>>> groupOn (`rem` 3) [5,8,3,6,2]
[(2,[5,8,2]),(0,[3,6])]

This function groups the elements of a list according to some key function. The desired behaviour here is a little subtle: we don’t want to just group adjacent elements, for instance.

groupOn (`rem` 3) [5,8,3,6,2] ≢ [(2,[5,8]),(0,[3,6]),(2,[2])]

And we don’t want to reorder the elements of the list by the keys:

groupOn (`rem` 3) [5,8,3,6,2] ≢ [(0,[3,6]),(2,[5,8,2])]

These constraints make it especially tricky to make this function lazy. In fact, at first glance, it seems impossible. What should, for instance, groupOn id [1..] return? It can’t even fill out the first group, since it will never find another 1. However, it can fill out the first key. And, in fact, the second. And it can fill out the first element of the first group. Precisely:

groupOn id [1..] ≡ [(1,1:⊥), (2,2:⊥), (3,3:⊥), ...

Another example is groupOn id (repeat 1), or groupOn id (cycle [1,2,3]). These each have partially-defined answers:

groupOn id (repeat 1)      ≡ (1,repeat 1):⊥

groupOn id (cycle [1,2,3]) ≡ (1,repeat 1):(2,repeat 2):(3,repeat 3):⊥

So there is some kind of well-defined lazy semantics for this function. The puzzle I was interested in was defining an efficient implementation for these semantics.

The Slow Case

The first approximation to a solution I could think of is the following:

groupOn :: Ord k => (a -> k) -> [a] -> [(k, [a])]
groupOn k = Map.toList . Map.fromListWith (++) . map (\x -> (k x, [x]))

In fact, if you don’t care about laziness, this is probably the best solution: it’s $\mathcal{O}(n \log n)$ , it performs well (practically as well as asymptotically), and it has the expected results.

However, there are problems. Primarily this solution cares about ordering, which we don’t want. We want to emit the results in the same order that they were in the original list, and we don’t necessarily want to require an ordering on the elements (for the efficient solution we will relax this last constraint).

Instead, let’s implement our own “map” type that is inefficient, but more general.

type Map a b = [(a,b)]

insertWith :: Eq a => (b -> b -> b) -> a -> b -> Map a b -> Map a b
insertWith f k v [] = [(k,v)]
insertWith f k v ((k',v'):xs)
  | k == k'   = (k',f v v') : xs
  | otherwise = (k',v') : insertWith f k v xs

groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])]
groupOn k = foldr (uncurry (insertWith (++))) [] . map (\x -> (k x, [x]))

The problem here is that it’s not lazy enough. insertWith is strict in its last argument, which means that using foldr doesn’t gain us anything laziness-wise.

There is some extra information we can use to drive the result: we know that the result will have keys that are in the same order as they appear in the list, with duplicates removed:

groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])]
groupOn k xs = map _ ks
  where
    ks = map k xs

From here, we can get what the values should be from each key by filtering the original list:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn key xs = map (\k -> (k, filter ((k==) . key) xs)) (nub (map key xs))

Using a kind of Schwartzian transform yields the following slight improvement:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn key xs = map (\k -> (k , map snd (filter ((k==) . fst) ks))) (nub (map fst ks))
  where
    ks = map (\x -> (key x, x)) xs

But this traverses the same list multiple times unnecessarily. The problem is that we’re repeating a lot of work between nub and the rest of the algorithm.

The following is much better:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn key = go . map (\x -> (key x, x)) 
  where
    go [] = []
    go ((k,x):xs) = (k,x:map snd y) : go ys
      where
        (y,ys) = partition ((k==).fst) xs

First, we perform the Schwartzian transform optimisation. The work of the algorithm is done in the go helper. The idea is to filter out duplicates as we encounter them: when we encounter (k,x) we can keep it immediately, but then we split the rest of the list into the components that have the same key as this element, and the ones that differ. The ones that have the same key can form the collection for this key, and those that differ are what we recurse on.

This partitioning also avoids re-traversing elements we know to be already accounted for in a previous group. I think that this is the most efficient (modulo some inlining and strictness improvements) algorithm that can do groupOn with just an Eq constraint.

A Faster Version

The reason that the groupOn above is slow is that every element returned has to traverse the entire rest of the list to remove duplicates. This is a classic pattern of quadratic behaviour: we can improve it by using the same trick as quick sort, by partitioning the list into lesser and greater elements on every call.

groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
groupOnOrd key = go . map (\x -> (key x, x)) 
  where
    go [] = []
    go ((k,x):xs) = (k,x:e) : go lt ++ go gt
      where
        (e,lt,gt) = foldr split ([],[],[]) xs
        split ky@(k',y) ~(e,lt,gt) = case compare k' k of
          LT -> (e, ky:lt, gt)
          EQ -> (y:e, lt, gt)
          GT -> (e, lt, ky:gt)

While this is $\mathcal{O}(n \log n)$ , and it does group elements, it also reorders the underlying list. Let’s fix that by tagging the incoming elements with their positions, and then using those positions to order them back into their original configuration:

groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
groupOnOrd k = map (\(_,k,xs) -> (k,xs)) . go . zipWith (\i x -> (i, k x, x)) [0..]
  where
    go [] = []
    go ((i, k, x):xs) = (i, k, x : e) : merge (go l) (go g)
      where 
        (e, l, g) = foldr split ([],[],[]) xs
        
        split ky@(_,k',y) ~(e, l, g) = case compare k' k of
          LT -> (e  , ky : l,      g)
          EQ -> (y:e,      l,      g)
          GT -> (e  ,      l, ky : g)
          
    merge [] gt = gt
    merge lt [] = lt
    merge (l@(i,_,_):lt) (g@(j,_,_):gt)
      | i <= j    = l : merge lt (g:gt)
      | otherwise = g : merge (l:lt) gt

This is close, but still not right. This isn’t yet lazy. The merge function is strict in both arguments.

However, we have all the information we need to unshuffle the lists without having to inspect them. In split, we know which direction we put each element: we can store that info without using indices.

groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
groupOnOrd k = catMaybes . go . map (\x -> (k x, x))
  where
    go [] = []
    go ((k,x):xs) = Just (k, x : e) : merge m (go l) (go g)
      where 
        (e, m, l, g) = foldr split ([],[],[],[]) xs
        
        split ky@(k',y) ~(e, m, l, g) = case compare k' k of
          LT -> (  e, LT : m, ky : l,      g)
          EQ -> (y:e, EQ : m,      l,      g)
          GT -> (  e, GT : m,      l, ky : g)
          
    merge []        lt     gt     = []
    merge (EQ : xs) lt     gt     = Nothing : merge xs lt gt
    merge (LT : xs) (l:lt) gt     = l       : merge xs lt gt
    merge (GT : xs) lt     (g:gt) = g       : merge xs lt gt

What we generate here is a [Ordering]: this list tells us the result of all the compare operations on the input list. Then, in merge, we invert the action of split, rebuilding the original list without inspecting either lt or gt.

And this solution works! It’s $\mathcal{O}(n \log n)$ , and fully lazy.

>>> map fst . groupOnOrd id $ [1..]
[1..]

>>> groupOnOrd id $ cycle [1,2,3]
(1,repeat 1):(2,repeat 2):(3,repeat 3):⊥

>>> groupOnOrd (`rem` 3) [1..]
(1,[1,4..]):(2,[2,5..]):(0,[3,6..]):⊥

The finished version of these two functions, along with some benchmarks, is available here.

Depth Comonads

Donnacha Oisín Kidney — Tue, 03 May 2022 00:00:00 UT

Posted on May 3, 2022

Tags: Agda

I haven’t written much on this blog recently: since starting a PhD all of my writing output has gone towards paper drafts and similar things. Recently, though, I’ve been thinking about streams, monoids, and comonads and I haven’t manage to wrangle those thoughts into something coherent enough for a paper. This blog post is a collection of those (pretty disorganised) thoughts. The hope is that writing them down will force me to clarify things, but consider this a warning that the rest of this post may well be muddled and confusing.

Streams

The first thing I want to talk about is streams.

record Stream (A : Type) : Type where
  coinductive
  field head : A
        tail : Stream A

This representation is coinductive: the type above contains infinite values. Agda, unlike Haskell, treats inductive and coinductive types differently (this is why we need the coinductive keyword in the definition). One of the differences is that it doesn’t check termination for construction of these values:

alternating : Stream Bool
alternating .head       = true
alternating .tail .head = false
alternating .tail .tail = alternating

alternating :: [Bool]
alternating = True : False : alternating

We have the equivalent in Haskell on the right. We’re also using some fancy syntax for the Agda code: copatterns (Abel and Pientka 2013).

Note that this type is only definable in a language with some notion of laziness. If we tried to define a value like alternating above in OCaml we would loop. Haskell has no problem, and Agda—through its coinduction mechanism—can handle it as well.

Update 4-5-22: thanks to Arnaud Spiwack (@aspiwack) for correcting me on this, it turns out the definition of alternating above can be written in Ocaml, even without laziness. Apparently Ocaml has a facility for strict cyclic data structures. Also, I should be a little more precise with what I’m saying above: even without the extra facility for strict cycles, you can of course write a lazy list with some kind of lazy wrapper type.

There is, however, an isomorphic type that can be defined without coinduction:

(notice that, in this form, the function ℕ-alternating is the same function as even : ℕ → Bool)

In fact, we can convert from the coinductive representation to the inductive one. This conversion function is more familiarly recognisable as the indexing function:

_[_] : Stream A → ℕ-Stream A
xs [ zero  ] = xs .head
xs [ suc n ] = xs .tail [ n ]

I’m not just handwaving when I say the two representations are isomorphic: we can prove this isomorphism, and, in Cubical Agda, we can use this to transport programs on one representation to the other.

Proof of isomorphism

Weighted Search Package

Donnacha Oisín Kidney — Sun, 29 Aug 2021 00:00:00 UT

Posted on August 29, 2021

Tags: Haskell

I have packaged up the more interesting bits from the Algebras for Weighted Search paper and put it up on hackage.

You can see it here.

It contains the HeapT monad, the Monus class, and an implementation of Dijkstra’s algorithm, the Viterbi algorithm, and probabilistic parsing.

Check it out!

ICFP Paper—Algebras for Weighted Search

Donnacha Oisín Kidney — Mon, 21 Jun 2021 00:00:00 UT

Posted on June 21, 2021

Tags: Haskell, Agda

The paper “Algebras for Weighted Search” has just been accepted unconditionally to ICFP. I wrote it with my supervisor, Nicolas Wu, and it covers a lot of the topics I’ve written about on this blog (including hyperfunctions and breadth-first traversals).

The preprint is available here.

Hyperfunctions

Donnacha Oisín Kidney — Sun, 14 Mar 2021 00:00:00 UT

Posted on March 14, 2021

Tags: Haskell

Check out this type:

newtype a -&> b = Hyp { invoke :: (b -&> a) -> b }

This a hyperfunction (J. Launchbury, Krstic, and Sauerwein 2013; 2000; 2000), and I think it’s one of the weirdest and most interesting newtypes you can write in Haskell.

The first thing to notice is that the recursion pattern is weird. For a type to refer to itself recursively on the left of a function arrow is pretty unusual, but on top of that the recursion is non-regular. That means that the recursive reference has different type parameters to its parent: a -&> b is on the left-hand-side of the equals sign, but on the right we refer to b -&> a.

Being weird is reason enough to write about them, but what’s really shocking about hyperfunctions is that they’re useful. Once I saw the definition I realised that a bunch of optimisation code I had written (to fuse away zips in particular) was actually using hyperfunctions (Ghani et al. 2005). After that, I saw them all over the place: in coroutine implementations, queues, breadth-first traversals, etc.

Anyways, since coming across hyperfunctions a few months ago I thought I’d do a writeup on them. I’m kind of surprised they’re not more well-known, to be honest: they’re like a slightly more enigmatic Cont monad, with a far cooler name. Let’s get into it!

What Are Hyperfunctions?

The newtype noise kind of hides what’s going on with hyperfunctions: expanding the definition out might make things slightly clearer.

type a -&> b = (b -&> a) -> b
             = ((a -&> b) -> a) -> b
             = (((b -&> a) -> b) -> a) -> b
             = ((((... -> b) -> a) -> b) -> a) -> b

So a value of type a -&> b is kind of an infinitely left-nested function type. One thing worth noticing is that all the as are in negative positions and all the bs in positive. This negative and positive business basically refers to the position of arguments in relation to a function arrow: to the left are negatives, and to the right are positives, but two negatives cancel out.

((((... -> b) -> a) -> b) -> a) -> b
           +     -     +     -     +

All the things in negative positions are kind of like the things a function “consumes”, and positive positions are the things “produced”. It’s worth fiddling around with very nested function types to get a feel for this notion. For hyperfunctions, though, it’s enough to know that a -&> b does indeed (kind of) take in a bunch of as, and it kind of produces bs.

By the way, one of the ways to get to grips with polarity in this sense is to play around with the Cont monad, codensity monad, or selection monad (Hedges 2015). If you do, you may notice one of the interesting parallels about hyperfunctions: the type a -&> a is in fact the fixpoint of the continuation monad (Fix (Cont a)). Suspicious!

Hyperfunctions Are Everywhere

Before diving further into the properties of the type itself, I’d like to give some examples of how it can show up in pretty standard optimisation code.

Zips

Let’s say you wanted to write zip with foldr (I have already described this particular algorithm in a previous post). Not foldr on the left argument, mind you, but foldr on both. If you proceed mechanically, replacing every recursive function with foldr, you can actually arrive at a definition:

zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr xf xb xs (foldr yf yb ys)
  where
    xf x xk yk = yk x xk
    xb _ = []
    
    yf y yk x xk = (x,y) : xk yk
    yb _ _ = []

In an untyped language, or a language with recursive types, such a definition would be totally fine. In Haskell, though, the compiler will complain with the following:

• Occurs check: cannot construct the infinite type:
    t0 ~ a -> (t0 -> [(a, b)]) -> [(a, b)]

Seasoned Haskellers will know, though, that this is not a type error: no, this is a type recipe. The compiler is telling you what parameters it wants you to stick in the newtype:

newtype Zip a b = 
  Zip { runZip :: a -> (Zip a b -> [(a,b)]) -> [(a,b)] }

zip :: forall a b. [a] -> [b] -> [(a,b)]
zip xs ys = xz yz
  where
    xz :: Zip a b -> [(a,b)]
    xz = foldr f b xs
      where    
        f x xk yk = runZip yk x xk
        b _ = []
    
    yz :: Zip a b
    yz = foldr f b ys
      where
        f y yk = Zip (\x xk -> (x,y) : xk yk)
        b = Zip (\_ _ -> [])

And here we see the elusive hyperfunction: hidden behind a slight change of parameter order, Zip a b is in fact the same as [(a,b)] -&> (a -> [(a,b)]).

zip :: forall a b. [a] -> [b] -> [(a,b)]
zip xs ys = invoke xz yz
  where
    xz :: (a -> [(a,b)]) -&> [(a,b)]
    xz = foldr f b xs
      where
        f x xk = Hyp (\yk -> invoke yk xk x)
        b = Hyp (\_ -> [])
    
    yz :: [(a,b)] -&> (a -> [(a,b)]) 
    yz = foldr f b ys
      where
        f y yk = Hyp (\xk x -> (x,y) : invoke xk yk)
        b = Hyp (\_ _ -> [])

BFTs

In another previous post I derived the following function to do a breadth-first traversal of a tree:

data Tree a = a :& [Tree a]

newtype Q a = Q { q :: (Q a -> [a]) -> [a] }

bfe :: Tree a -> [a]
bfe t = q (f t b) e
  where
    f :: Tree a -> Q a -> Q a
    f (x :& xs) fw = Q (\bw -> x : q fw (bw . flip (foldr f) xs))
    
    b :: Q a
    b = Q (\k -> k b)
    
    e :: Q a -> [a]
    e (Q q) = q e

That Q type there is another hyperfunction.

bfe :: Tree a -> [a]
bfe t = invoke (f t e) e
  where
    f :: Tree a -> ([a] -&> [a]) -> ([a] -&> [a])
    f (x :& xs) fw = Hyp (\bw -> x : invoke fw (Hyp (invoke bw . flip (foldr f) xs)))
    
    e :: [a] -&> [a]
    e = Hyp (\k -> invoke k e)

One of the problems I had with the above function was that it didn’t terminate: it could enumerate all the elements of the tree but it didn’t know when to stop. A similar program (Allison 2006; described and translated to Haskell in Smith 2009) manages to solve the problem with a counter. Will it shock you to find out this solution can also be encoded with a hyperfunction?

bfe t = invoke (f t (Hyp b)) e 1
  where
    f :: Tree a -> (Int -> [a]) -&> (Int -> [a]) 
                -> (Int -> [a]) -&> (Int -> [a])
    f (x :& xs) fw =
      Hyp (\bw n -> x : invoke fw (Hyp (\k m -> invoke bw (foldr f k xs) (m+1))) n)
              
    e :: (Int -> [a]) -&> (Int -> [a])
    e = Hyp (\k -> invoke k e)
    
    b x 0 = []
    b x n = invoke x (Hyp b) (n-1)

(my version here is actually a good bit different from the one in Smith 2009, but the basic idea is the same)

Coroutines

Hyperfunctions seem to me to be quite deeply related to coroutines. At the very least several of the types involved in coroutine implementations are actual hyperfunctions. The ProdPar and ConsPar types from Pieters and Schrijvers (2019) are good examples:

newtype ProdPar a b = ProdPar (ConsPar a b -> b) 
newtype ConsPar a b = ConsPar (a -> ProdPar a b -> b)

ProdPar a b is isomorphic to (a -> b) -&> b, and ConsPar a b to b -&> (a -> b), as witnessed by the following functions:

Conversion functions between ProdPar, ConsPar and hyperfunctions

Master's Thesis

Donnacha Oisín Kidney — Mon, 04 Jan 2021 00:00:00 UT

Posted on January 4, 2021

Tags: Agda

The final version of my master’s thesis got approved recently so I thought I’d post it here for people who might be interested.

Here’s the university record.

Here’s the pdf.

And all of the theorems in the thesis have been formalised in Agda. The code is organised to follow the structure of the pdf here.

The title of the thesis is “Finiteness in Cubical Type Theory”: basically it’s all about formalising the notion of “this type is finite” in CuTT. I also wanted to write something that could serve as a kind of introduction to some components of modern dependent type theory which didn’t go the standard length-indexed vector route.

Trees indexed by a Cayley Monoid

Donnacha Oisín Kidney — Sun, 27 Dec 2020 00:00:00 UT

Posted on December 27, 2020

Tags: Haskell

The Cayley monoid is well-known in Haskell (difference lists, for instance, are a specific instance of the Cayley monoid), because it gives us $O(1)$ <>. What’s less well known is that it’s also important in dependently typed programming, because it gives us definitional associativity. In other words, the type x . (y . z) is definitionally equal to (x . y) . z in the Cayley monoid.

Some helpers and extra code

Enumerating Trees

Donnacha Oisín Kidney — Mon, 14 Dec 2020 00:00:00 UT

Posted on December 14, 2020

Tags: Agda, Haskell

Consider the following puzzle:

Given a list of $n$ labels, list all the trees with those labels in order.

For instance, given the labels [1,2,3,4], the answer (for binary trees) is the following:

┌1     ┌1      ┌1     ┌1     ┌1
┤      ┤      ┌┤     ┌┤     ┌┤
│┌2    │ ┌2   ││┌2   │└2    │└2
└┤     │┌┤    │└┤    ┤     ┌┤
 │┌3   ││└3   │ └3   │┌3   │└3
 └┤    └┤     ┤      └┤    ┤
  └4    └4    └4      └4   └4

This problem (the “enumeration” problem) turns out to be quite fascinating and deep, with connections to parsing and monoids. It’s also just a classic algorithmic problem which is fun to try and solve.

The most general version of the algorithm is on forests of rose trees:

data Rose a = a :& Forest a
type Forest a = [Rose a]

It’s worth having a go at attempting it yourself, but if you’d just like to see the slick solutions the following is one I’m especially proud of:

Solution to the Enumeration Problem on Forests of Rose Trees

A Queue for Effectful Breadth-First Traversals

Donnacha Oisín Kidney — Mon, 23 Nov 2020 00:00:00 UT

Posted on November 23, 2020

Part 10 of a 10-part series on Breadth-First Traversals

Tags: Haskell

We pick up the story again at the question of a breadth-first (Applicative) traversal of a rose tree (Gibbons 2015). In the last post, I finally came up with an implementation I was happy with:

data Tree a = a :& [Tree a]

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
bft f (x :& xs) = liftA2 (:&) (f x) (bftF f xs)

bftF :: Applicative f => (a -> f b) -> [Tree a] -> f [Tree b]
bftF t = fmap head . foldr (<*>) (pure []) . foldr f [pure ([]:)]
  where
    f (x :& xs) (q : qs) = liftA2 c (t x) q : foldr f (p qs) xs
    
    p []     = [pure ([]:)]
    p (x:xs) = fmap (([]:).) x : xs

    c x k (xs : ks) = ((x :& xs) : y) : ys
      where (y : ys) = k ks

It has the correct semantics and asymptotics.

tree =
    1 :&
      [ 2 :&
          [ 5 :&
              [ 9  :& []
              , 10 :& []]
          , 6 :& []]
      , 3 :& []
      , 4 :&
          [ 7 :&
              [ 11 :& []
              , 12 :& []]
          , 8 :& []]]
          
>>> bft print tree
1
2
3
4
5
6
7
8
9
10
11
12
() :&
   [ () :&
        [ () :&
             [ () :& []
             , () :& []]
        , () :& []]
   , () :&   []
   , () :&
        [ () :&
             [ () :& []
             , () :& []]
        , () :& []]]

But it’s quite difficult to understand, and doesn’t lend much insight into what’s going on with the whole “breadth-first” notion. The technique the function uses also isn’t reusable.

A much nicer function uses the Phases Applicative (Easterly 2019):

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
bft f = runPhases . go
  where
    go (x :& xs) = liftA2 (:&) (Lift (f x)) (later (traverse go xs))

But this function is quadratic.

So the task for this post today is to derive a type like the Phases type with a later operation, but which has the appropriate performance characteristics. At the end I’ll look into what the theoretical properties of this type are.

A Free Applicative

At its core, the Phases type is basically a free Applicative (Capriotti and Kaposi 2014). I’ll reimplement it here as a slightly different free Applicative (one that’s based on liftA2 rather than <*>):

data Free f a where
  Pure :: a -> Free f a
  Lift :: (a -> b -> c) -> f a -> Free f b -> Free f c
  
lower :: Applicative f => Free f a -> f a
lower (Pure x) = pure x
lower (Lift f x xs) = liftA2 f x (lower xs)

The key with the Phases type is to observe that there’s actually two possible implementations of Applicative for the Free type above: one which makes it the “correct” free applicative:

instance Applicative (Free f) where
  pure = Pure

  liftA2 c (Pure x) ys = fmap (c x) ys
  liftA2 c (Lift f x xs) ys = Lift (\x (y,z) -> c (f x y) z) x (liftA2 (,) xs ys)

And then one which zips effects together:

instance Applicative f => Applicative (Free f) where
  pure = Pure
  
  liftA2 c (Pure x) ys = fmap (c x) ys
  liftA2 c xs (Pure y) = fmap (flip c y) xs
  liftA2 c (Lift f x xs) (Lift g y ys) = 
    Lift 
      (\(x,y) (xs,ys) -> c (f x xs) (g y ys)) 
      (liftA2 (,) x y) 
      (liftA2 (,) xs ys)

This second instance makes the Free type into not a free Applicative at all: instead it’s some kind of Applicative transformer which we can use to reorder effects. Since effects are combined only when they’re at the same point in the list, we can use it to do our breadth-first traversal.

As an aside, from this perspective it’s clear that this is some kind of FunList (van Laarhoven 2009): this opens up a lot of interesting curiosities about the type, since that type in particular is quite well-studied.

Anyway, we’re able to do the later operation quite simply:

later :: Free f a -> Free f a
later = Lift (const id) (pure ())

Making it Efficient

The problem at the moment is that the Applicative instance has an $\mathcal{O}(n)$ liftA2 implementation: this translates into an $\mathcal{O}(n^2)$ traversal overall.

If we were working in a more simple context of just enumerating the contents of the tree, we might at this point look to something like difference lists: these use the cayley transform on the list monoid to turn the append operation from $\mathcal{O}(n)$ to $\mathcal{O}(n^2)$ . It turns out that there is a similar cayley transformation for Applicative functors (Rivas and Jaskelioff 2014; Rivas, Jaskelioff, and Schrijvers 2015):

newtype Day f a = Day { runDay :: ∀ b. f b -> f (a, b) }

instance Functor f => Functor (Day f) where
  fmap f xs = Day (fmap (first f) . runDay xs)
  
instance Functor f => Applicative (Day f) where
  pure x = Day (fmap ((,) x))
  liftA2 c xs ys =
    Day (fmap (\(x,(y,z)) -> (c x y, z)) . runDay xs . runDay ys)

And with this type we can implement our queue of applicative effects:

type Queue f = Day (Free f)

runQueue :: Applicative f => Queue f a -> f a
runQueue = fmap fst . lower . flip runDay (Pure ())

now :: Applicative f => f a -> Queue f a
now xs = Day \case
  Pure x      -> Lift (,) xs (Pure x)
  Lift f y ys -> Lift (\(x,y) z -> (x, f y z)) (liftA2 (,) xs y) ys

later :: Applicative f => Queue f a -> Queue f a
later xs = Day \case
  Pure x      -> Lift (const id) (pure ()) (runDay xs (Pure x))
  Lift f y ys -> Lift (\x (y,z) -> (y, f x z)) y (runDay xs ys)

As expected, this gives us the clean implementation of a breadth-first traversal with the right asymptotics (I think):

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
bft f = runQueue . go
  where
    go (x :& xs) = liftA2 (:&) (now (f x)) (later (traverse go xs))

(it’s worth pointing out that we haven’t actually used the applicative instance on the free applicative at any point: we have inlined all of the “zipping” to make it absolutely clear that everything has stayed linear).

So what’s the Theory?

I have yet to really dive deep on any of the theory involved in this type, I just quickly wrote up this post when I realised I was able to use the cayley transform from the mentioned papers to implement the proper breadth-first traversal. It certainly seems worth looking at more!

References

Capriotti, Paolo, and Ambrus Kaposi. 2014. “Free Applicative Functors.” Electronic Proceedings in Theoretical Computer Science 153 (June): 2–30. doi:10.4204/EPTCS.153.2. http://www.paolocapriotti.com/assets/applicative.pdf.

Easterly, Noah. 2019. “Functions and newtype wrappers for traversing Trees: Rampion/tree-traversals.” https://github.com/rampion/tree-traversals.

Gibbons, Jeremy. 2015. “Breadth-First Traversal.” Patterns in Functional Programming. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Rivas, Exequiel, and Mauro Jaskelioff. 2014. “Notions of Computation as Monoids.” arXiv:1406.4823 [cs, math] (May). http://arxiv.org/abs/1406.4823.

Rivas, Exequiel, Mauro Jaskelioff, and Tom Schrijvers. 2015. “From monoids to near-semirings: The essence of MonadPlus and Alternative.” In Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming, 196–207. ACM. doi:10.1145/2790449.2790514. http://www.fceia.unr.edu.ar/~mauro/pubs/FromMonoidstoNearsemirings.pdf.

van Laarhoven, Twan. 2009. “A non-regular data type challenge.” Twan van Laarhoven’s Blog. https://twanvl.nl/blog/haskell/non-regular1.

How to set up GitHub Actions for your Agda project

Donnacha Oisín Kidney — Wed, 18 Nov 2020 00:00:00 UT

Posted on November 18, 2020

Tags: Agda

Update 2022-11-12

The best approach to this now is probably to use this action, specifically set up for Agda:

github.com/wenkokke/setup-agda

I’ll leave the rest of this post here, but bear in mind the advice is outdated.

Recently travis-ci.org announced that they were closing down, and moving to travis-ci.com. For people who use the service, this basically means that the free component is going away, and you’ll have to pay in the future.

As a result, a lot of people are looking to move to another ci service, so I thought I’d put this short guide together on how to use GitHub actions to typecheck an Agda project and host the rendered code through GitHub pages. The system I have is quite fast: for a quite large project it takes about a minute from pushing for the action to complete.

If you just want to use the same script as me, you can see it here: the rest of this post will just be going through that script and explaining it.

Setting up a Basic Action

First things first: in order to make an action, you need to put a YAML file in the .github/workflows directory of your repository. You can have the following lines at the start:

name: Compile Agda and Deploy HTML
on:
  push:
    branches:
      - master

This gives a name for the action (which will show up in the actions tab online for the repo), and says that the action should be run whenever there’s a push to the branch named master.

We then list the “jobs” the actions does: just one for this action, called build:

jobs:
  build:

Configuring The Runner

GitHub actions run on GitHub’s servers, the specifications of which can be seen here. For this action we won’t need anything special, so we’ll just use the following:

    runs-on: ubuntu-18.04

Next we will have the matrix:

    strategy:
      matrix:
        cubical-ref: ["v0.2"]
        agda-ref: ["v2.6.1.1"]
        ghc-ver: ["8.10.2"]
        cabal-ver: ["3.4.0.0"]

I’m using this matrix as a crude system for environment variables; if this was a CI for some software I wanted to deploy, you could include multiple values for each variable here, to check that the whole thing runs properly with each.

Caching

We’re now onto the “steps” portion of the script, where we write small bash-esque script to be run. As such we have the line:

    steps:

The first step is to cache all the cabal packages we’re going to install. Agda takes about 45 minutes to install so this step is crucial:

    - uses: actions/cache@v2
      name: Cache cabal packages
      id: cache-cabal
      with:
        path: |
          ~/.cabal/packages
          ~/.cabal/store
          ~/.cabal/bin
          dist-newstyle
        key: ${{ runner.os }}-${{ matrix.ghc-ver }}-${{ matrix.cabal-ver }}-${{ matrix.agda-ref }}

The path field tells the action which folders to cache, the key field tells it what key to store them under.

Installing Agda

To install Agda we first need to install cabal:

    - name: Install cabal
      if: steps.cache-cabal.outputs.cache-hit != 'true'
      uses: actions/setup-haskell@v1.1.3
      with:
        ghc-version: ${{ matrix.ghc-ver }}
        cabal-version: ${{ matrix.cabal-ver }}

The if field here allows us to skip this step if we had a cache hit previously (i.e. if Agda is already installed).

Next we need to ensure that all of the programs installed by cabal are in the path:

    - name: Put cabal programs in PATH
      run: echo "~/.cabal/bin" >> $GITHUB_PATH

And then we download and install Agda (along with some dependencies that aren’t installed automatically):

    - name: Download Agda from github
      if: steps.cache-cabal.outputs.cache-hit != 'true'
      uses: actions/checkout@v2
      with:
        repository: agda/agda
        path: agda
        ref: ${{ matrix.agda-ref }}
      
    - name: Install Agda
      if: steps.cache-cabal.outputs.cache-hit != 'true'
      run: |
        cabal update
        cabal install --overwrite-policy=always --ghc-options='-O2 +RTS -M6G -RTS' alex-3.2.5
        cabal install --overwrite-policy=always --ghc-options='-O2 +RTS -M6G -RTS' happy-1.19.12
        cd agda
        mkdir -p doc
        touch doc/user-manual.pdf
        cabal install --overwrite-policy=always --ghc-options='-O1 +RTS -M6G -RTS'

The strange flags to cabal install here are probably necessary: I was running out of memory when I tried to install Agda without them. This might be fixed in future versions of Agda.

Installing Agda Dependencies

We next need to install any Agda libraries your code depends on. For instance, in my project, I use the cubical library: since Agda doesn’t have a package manager, we basically have to handle all the versioning and so on manually. Also, in order to speed up the build we have to cache the typecheck files for the library.

    - name: Checkout cubical library
      uses: actions/checkout@v2
      with:
        repository: agda/cubical
        path: cubical
        ref: ${{ matrix.cubical-ref }}

    - name: Cache cubical library
      uses: actions/cache@v2
      id: cache-cubical
      with:
        path: ~/cubical-build
        key: ${{ runner.os }}-${{ matrix.agda-ver }}-${{ matrix.cubical-ref }}

So the library is accessible as an import we need to put it in the Agda library list:

    - name: Put cubical library in Agda library list
      run: |
        mkdir -p ~/.agda/
        touch ~/.agda/libraries
        echo "$GITHUB_WORKSPACE/cubical/cubical.agda-lib" > ~/.agda/libraries

We then need to typecheck the library: this bit is a little tricky, since not all files in the cubical library actually typecheck.

    - name: Compile cubical library
      if: steps.cache-cubical.outputs.cache-hit != 'true'
      run: |
        cd $GITHUB_WORKSPACE/cubical
        agda Cubical/Core/Everything.agda
        agda Cubical/Foundations/Everything.agda
        find Cubical/Data -type f -name "*.agda" | while read -r code ; do
            agda $code
        done
        find Cubical/HITs -type f -name "*.agda" | while read -r code ; do
            agda $code
        done
        cp -f -r _build/ ~/cubical-build

Finally, if the cubical library was already typechecked then we don’t need to do any of that, and we instead just retrieve it from the cache:

    - name: Retrieve cubical library
      if: steps.cache-cubical.outputs.cache-hit == 'true'
      run: |
        mkdir -p cubical/_build
        cp -f -r ~/cubical-build/* cubical/_build

Typechecking the library

Finally we have to typecheck the library itself. We want to cache the output from this step as well, but importantly we want to support incremental recompilation: i.e. if we only make a small change in one file we don’t want to have to typecheck every other. We can do this with restore-keys in the cache:

    - name: Checkout main
      uses: actions/checkout@v2
      with:
        path: main

    - uses: actions/cache@v2
      name: Cache main library
      id: cache-main
      with:
        path: ~/main-build
        key: html-and-tex-${{ runner.os }}-${{ matrix.agda-ver }}-${{ matrix.cubical-ref }}-${{ hashFiles('main/**') }}
        restore-keys: |
          html-and-tex-${{ runner.os }}-${{ matrix.agda-ver }}-${{ matrix.cubical-ref }}-
          html-and-tex-${{ runner.os }}-${{ matrix.agda-ver }}-

    - name: Retrieve main library
      if: steps.cache-main.outputs.cache-hit == 'true'
      run: cp -f -R ~/main-build/* $GITHUB_WORKSPACE/main

Finally, we need to make an “Everything” file: this is an Agda module which contains an import for every module in the project. Typechecking this file is faster than typechecking each file individually.

    - name: Compile main library
      if: steps.cache-main.outputs.cache-hit != 'true'
      run: |
        mkdir -p ~/main-build/_build
        cp -f -R ~/main-build/_build $GITHUB_WORKSPACE/main/_build
        rm -r ~/main-build
        cd main
        find . -type f \( -name "*.agda" -o -name "*.lagda" \) > FileList
        sort -o FileList FileList
        echo "{-# OPTIONS --cubical #-}" > Everything.agda
        echo "" >> Everything.agda
        echo "module Everything where" >> Everything.agda
        echo "" >> Everything.agda
        echo "-- This file imports every module in the project. Click on" >> Everything.agda
        echo "-- a module name to go to its source." >> Everything.agda
        echo "" >> Everything.agda
        cat FileList | cut -c 3-               \
                     | cut -f1 -d'.'           \
                     | sed 's/\//\./g'         \
                     | sed 's/^/open import /' \
                     >> Everything.agda
        rm FileList
        agda --html --html-dir=docs Everything.agda
        rm Everything.agda
        cd ..
        cp -f -R main/ ~/main-build/

And then we need to deploy the generated html so we can see the rendered library.

    - name: Deploy html to github pages
      uses: peaceiris/actions-gh-pages@v3
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: main/docs

This last step will need you to turn on the github pages setting in your repository, and have it serve from the gh-pages branch.

Conclusion

Hopefully this script will be useful to some other people! The first time it runs it should take between 30 minutes and an hour; subsequently it takes about a minute for me.

Fun with Combinators

Donnacha Oisín Kidney — Sat, 17 Oct 2020 00:00:00 UT

Posted on October 17, 2020

Tags: Combinators

There are a bunch of “minimal” computational models out there: Turing machines, lambda calculus, PowerPoint (Wildenhain 2017), etc. These are radically simple languages which are nonetheless Turing complete, so theoretically “as powerful” as each other. Of those, lambda calculus is without question my favourite to actually write programs in: it’s the one which is closest to crawling out of the Turing tarpit.

In terms of implementation, though, it is far from simple. Lambda calculus has variables, which introduce huge complexity into the interpreter: especially if you want to do any kind of formal reasoning about programs, this complexity is a problem. We might want to reach for something even lower-level than lambda calculus: this is where combinator calculi come in.

You may have heard of SKI combinator calculus: it’s the “simplest” of the calculi, but it’s not actually very easy to understand, and it’s absolute murder to try use. So we’re going to start with BCKW, a more obscure calculus, actually invented by Haskell Curry.

There are 4 combinators in BCKW: B, C, K, and W (shocking, I know). You can think about these combinators as functions which manipulate the beginning of strings:

Bxyz ~> x(yz)
Cxyz ~> xzy
Kxy  ~> x
Wxy  ~> xyy

Upper case letters are combinators, lower-case are variables. Yes, yes, I know I said that combinator calculi didn’t need variables, and it doesn’t! I’m just using them here to explain how each of the combinators work. If you really want to be pedantic you can think of the lower case letters as notational placeholders meaning “any given combinator”. They won’t exist in any actual programs we write.

Let’s work with some examples to get a sense for how these combinators work.

The simplest combinator is K: it’s actually equivalent to the const function from Haskell. It discards its second argument, and returns the first. If you give a combinator more arguments than it usually accepts, you just keep the extra arguments in the output:

Kxyz ~> xz

W is the next combinator: it duplicates its second argument.

Wxy ~> xyy

We always start from the left, applying the rule for the left-most combinator first.

WKxyz ~> Kxxyz ~> xyz
KWxyz ~> Wyz   ~> yzz

Next we have C: this is equivalent to the Haskell function flip. It swaps the second and third arguments:

Cxyz ~> xzy

Here’s a small little evaluator for expressions which use C, K, and W. You can edit the expression, and press enter to step through it.

The last combinator introduces parentheses, and it’s equivalent to function composition.

Bxyz ~> x(yz)

You can write parentheses yourself: implicitly, all expressions are left-associative. That means that the following are all equal:

xyz = (xy)z = (x)yz = ((x)y)z

But xyz is not equal to, say, x(yz).

And here’s a puzzle to start flexing your combinator skills: one of the combinators in SKI combinator calculus is I, which is the identity function.

Ix ~> x

Try write an expression which functions the same way as I, using only the BCKW combinators. Use the following evaluator to try and figure out how to do it: write an expression after λ> which functions the same as I.

Answer

Some More List Algorithms

Donnacha Oisín Kidney — Sat, 22 Aug 2020 00:00:00 UT

Posted on August 22, 2020

Tags: Haskell

It’s been a while since I last wrote a post (I’ve been busy with my Master’s thesis, which is nearly done), so I thought I would quickly throw out some fun snippets of Haskell I had reason to write over the past couple of weeks.

Zipping With Folds

For some reason, until recently I had been under the impression that it was impossible to fuse zips efficiently. In other words, I thought that zip was like tail, in that if it was implemented using only foldr it would result in an asymptotic slowdown (tail is normally $\mathcal{O}(1)$ , implemented as a fold it’s $\mathcal{O}(n)$ ).

Well, it seems like this is not the case. The old zip-folding code I had looks to me now to be the correct complexity: it’s related to How To Zip Folds, by Oleg Kiselyov (although I’m using a different version of the function which can be found on the mailing list). The relevant code is as follows:

newtype Zip a b = 
  Zip { runZip :: a -> (Zip a b -> b) -> b }

zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr xf xb xs (Zip (foldr yf yb ys))
  where
    xf x xk yk = runZip yk x xk
    xb _ = []
    
    yf y yk x xk = (x,y) : xk (Zip yk)
    yb _ _ = []

There are apparently reasons for why the Prelude’s zip isn’t allowed to fuse both of its arguments: I don’t fully understand them, however. (in particular the linked page says that the fused zip would have different strictness behaviour, but the version I have above seems to function properly).

This version of zip leads to some more fun solutions to folding puzzles, like this one:

Write a function that is equivalent to:
zipFromEnd xs ys = reverse (zip (reverse xs) (reverse ys))
Without creating any intermediate lists.

The desired function is interesting in that, instead of lining up lists according to their first elements, it aligns them according to the ends.

>>> zipFromEnd [1,2,3] "abc"
[(1,'a'),(2,'b'),(3,'c')]

>>> zipFromEnd [1,2,3] "abcd"
[(1,'b'),(2,'c'),(3,'d')]

>>> zipFromEnd [1,2,3,4] "abc"
[(2,'a'),(3,'b'),(4,'c')]

The solution here is just to use foldl, and we get the following:

zipFromEnd :: [a] -> [b] -> [(a,b)]
zipFromEnd xs ys = foldl xf xb xs (Zip (foldl yf yb ys)) []
  where
    xf xk x yk = runZip yk x xk
    xb _ zs = zs
    
    yf yk y x xk zs = xk (Zip yk) ((x,y) : zs)
    yb _ _ zs = zs

Another function which is a little interesting is the “zip longest” function:

zipLongest :: (a -> a -> a) -> [a] -> [a] -> [a]
zipLongest c xs ys = foldr xf xb xs (Zip (foldr yf yb ys))
  where
    xf x xk yk = runZip yk (Just x) xk
    xb zs = runZip zs Nothing xb
    
    yf y yk Nothing  xk =     y : xk (Zip yk)
    yf y yk (Just x) xk = c x y : xk (Zip yk)
    
    yb Nothing  _  = []
    yb (Just x) zs = x : zs (Zip yb)

Finally, all of these functions rely on the Zip type, which is not strictly positive. This means that we can’t use it in Agda, and it’s tricky to reason about: I wonder what it is about functions for deforestation that tends to lead to non-strictly-positive datatypes.

Lexicographic Permutations

The next puzzle I was interested in was finding the next lexicographic permutation of some string. In other words, given some string $s$ , you need to find another string $t$ that is a permutation of $s$ such that $s < t$ , and that there is no string $u$ that is a permutation of $s$ and $s < u < t$ . The Wikipedia article on the topic is excellent (and clear), but again the algorithm is described in extremely imperative terms:

Find the largest index k such that a[k] < a[k + 1]. If no such index exists, the permutation is the last permutation.

Find the largest index l greater than k such that a[k] < a[l].

Swap the value of a[k] with that of a[l].

Reverse the sequence from a[k + 1] up to and including the final element a[n].

The challenge here is to write this algorithm without doing any indexing: indexing is expensive on Haskell lists, and regardless it is cleaner to express it without.

I managed to work out the following:

nextLexPerm :: Ord a => [a] -> Maybe [a]
nextLexPerm []     = Nothing
nextLexPerm (x:xs) = go1 x xs
  where
    go1 _ []     = Nothing
    go1 i (j:xs) = maybe (go2 i j [] xs) (Just . (i:)) (go1 j xs)

    go2 i j xs ys
      | j <= i    = Nothing
      | otherwise = Just (fromMaybe (j : foldl (flip (:)) (i:xs) ys) (go3 i (j:xs) ys))

    go3 _ _  []     = Nothing
    go3 i xs (j:ys) = go2 i j xs ys

Circular Sorting

This comes from the Rosetta Code problem Circle Sort. This is a strange little sorting algorithm, where basically you compare elements on opposite sides of an array, swapping them as needed. The example given is the following:

6 7 8 9 2 5 3 4 1

First we compare (and swap) 6 and 1, and then 7 and 4, and so on, until we reach the middle. At this point we split the array in two and perform the procedure on each half. After doing this once it is not the case that the array is definitely sorted: you may have to repeat the procedure several (but finitely many) times, until no swaps are performed.

I have absolutely no idea what the practical application for such an odd algorithm would be, but it seemed like an interesting challenge to try implement it in a functional style (i.e. without indices or mutation).

The first thing we have to do is fold the list in half, so we pair up the right items. We’ve actually seen an algorithm to do this before: it’s often called the “tortoise and the hare”, and our previous use was to check if a list was a palindrome. Here’s how we implement it:

halve :: [a] -> [(a,a)]
halve xs = snd (go xs xs)
  where
    go (y:ys) (_:_:zs) = f y (go ys zs)
    go (_:ys) [_]      = (ys,[])
    go ys     []       = (ys,[])
    
    f x (y:ys,zs) = (ys, (x,y) : zs)
    
>>> halve [6,7,8,9,2,5,3,4,1]
[(6,1),(7,4),(8,3),(9,5)]

Notice that the 2 in the very middle of the list is missing from the output: I’ll describe how to handle that element later on. In the above piece of code, that 2 actually gets bound to the underscore (in (_:ys)) in the second clause of go.

Next we need to do the actual swapping: this is actually pretty straightforward, if we think of the algorithm functionally, rather than imperatively. Instead of swapping things in place, we are building up both halves of the new list, so the “swap” operation should simply decide which list each item goes into.

halve :: Ord a => [a] -> ([a],[a])
halve xs = tl (go xs xs)
  where
    tl (_,lte,gt) = (lte,gt)
    
    go (y:ys) (_:_:zs) = swap y (go ys zs)
    go (_:ys) [_]      = (ys,[],[])
    go ys     []       = (ys,[],[])
    
    swap x (y:ys,lte,gt) 
      | x <= y    = (ys, x : lte, y : gt)
      | otherwise = (ys, y : lte, x : gt)

At this point we can also see what to do with the middle item: we’ll put it in the higher or lower list, depending on a comparison with the element it’s next to.

halve :: Ord a => [a] -> ([a],[a])
halve xs = tl (go xs xs)
  where
    tl (_,lte,gt) = (lte,gt)
    
    go (y:ys) (_:_:zs) = swap y (go ys zs)
    go ys     []       = (ys,[],[])
    go (y:ys) [_]      = (ys,[y | e],[y | not e])
      where e = y <= head ys
    
    swap x (y:ys,lte,gt) 
      | x <= y    = (ys, x : lte, y : gt)
      | otherwise = (ys, y : lte, x : gt)

Next, we can use this as a helper function in the overall recursive function.

circleSort :: Ord a => [a] -> [a]
circleSort [] = []
circleSort [x] = [x]
circleSort xs =
  let (lte,gt) = halve xs
  in circleSort lte ++ circleSort (reverse gt)

This function isn’t correct (yet). As we mentioned already, we need to run the circle sort procedure multiple times until no swaps occur. We can add in the tracking of swaps like so:

circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
  where
    (swapped,ks) = go xs
    
    go []  = (False, [])
    go [x] = (False, [x])
    go xs  =
      let (s,_,lte,gt) = halve xs xs
          (sl,lte') = go lte
          (sg,gt' ) = go (reverse gt)
      in (s || sl || sg, lte' ++ gt')
      
    halve (y:ys) (_:_:zs) = swap y (halve ys zs)
    halve ys     []       = (False,ys,[],[])
    halve (y:ys) [_]      = (False,ys,[y | e],[y | not e])
      where e = y <= head ys
      
    swap x (s,y:ys,lte,gt) 
      | x <= y    = (s   ,ys, x : lte, y : gt)
      | otherwise = (True,ys, y : lte, x : gt)

So at this point we actually have a working implementation of the function, which avoids indices as intended. It has some problems still, though. First, we call ++, when we could be using difference lists. Here’s the solution to that:

circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
  where
    (swapped,ks) = go xs []
    
    go []  zs = (False, zs)
    go [x] zs = (False, x:zs)
    go xs  zs =
      let (s,_,lte,gt) = halve xs xs
          (sl,lte') = go lte gt'
          (sg,gt' ) = go (reverse gt) zs
      in (s || sl || sg, lte')
      
    halve (y:ys) (_:_:zs) = swap y (halve ys zs)
    halve ys     []       = (False,ys,[],[])
    halve (y:ys) [_]      = (False,ys,[y | e],[y | not e])
      where e = y <= head ys
      
    swap x (s,y:ys,lte,gt) 
      | x <= y    = (s   ,ys, x : lte, y : gt)
      | otherwise = (True,ys, y : lte, x : gt)

Next we can actually rewrite the go function to allow for a certain amount of tail recursion (kind of):

circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
  where
    (swapped,ks) = go xs (False,[])
    
    go []  (s,ks) = (s,ks)
    go [x] (s,ks) = (s,x:ks)
    go xs  (s,ks) =
      let (s',_,ls,rs) = halve s xs xs
      in go ls (go (reverse rs) (s',ks))
 
    halve s (y:ys) (_:_:zs) = swap y (halve s ys zs)
    halve s ys     []       = (s,ys,[],[])
    halve s (y:ys) [_]      = (s,ys,[y | e],[y | not e])
      where e = y <= head ys
 
    swap x (s,y:ys,ls,rs)
      | x <= y    = (   s,ys,x:ls,y:rs)
      | otherwise = (True,ys,y:ls,x:rs)

Next, we call reverse: but we can avoid the reverse by passing a parameter which tells us which direction we’re walking down the list. Since the swapping logic is symmetric, we’re able to just invert some of the functions. It is a little tricky, though:

circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
  where
    (swapped,ks) = go False xs (False,[])
    
    go d []  (s,ks) = (s,ks)
    go d [x] (s,ks) = (s,x:ks)
    go d xs  (s,ks) =
      let (s',_,ls,rs) = halve d s xs xs
      in go False ls (go True rs (s',ks))
 
    halve d s (y:ys) (_:_:zs) = swap d y (halve d s ys zs)
    halve d s ys     []       = (s,ys,[],[])
    halve d s (y:ys) [_]      = (s,ys,[y | e],[y | not e])
      where e = y <= head ys
 
    swap d x (s,y:ys,ls,rs)
      | bool (<=) (<) d x y = (    d || s,ys,x:ls,y:rs)
      | otherwise           = (not d || s,ys,y:ls,x:rs)

So there it is! The one-pass, purely function implementation of circle sort. Very possibly the most useless piece of code I’ve ever written.

Presentation on Purely Functional Data Structures

Donnacha Oisín Kidney — Tue, 19 May 2020 00:00:00 UT

Posted on May 19, 2020

Tags: Haskell

A week or so ago I gave a presentation on purely functional data structures as part of an interview¹. Here are the slides:

https://doisinkidney.com/pdfs/purely-functional-data-structures-slides.pdf

The presentation is meant to be about 45 minutes long, and it’s aimed at end of first year computer science students who have done some Haskell and know a little bit about pointers.

The interview went well, by the way! All going well with my master’s I’ll be starting a PhD in Imperial in Nicolas Wu’s group this October.↩︎

More Random Access Lists

Donnacha Oisín Kidney — Sat, 02 May 2020 00:00:00 UT

Posted on May 2, 2020

Part 2 of a 2-part series on Random Access Lists

Tags: Haskell, Agda

Imports and Pragmas

Another Breadth-First Traversal

Donnacha Oisín Kidney — Thu, 20 Feb 2020 00:00:00 UT

Posted on February 20, 2020

Part 9 of a 10-part series on Breadth-First Traversals

Tags: Haskell

This post will be quite light on details: I’m trying to gather up all of the material in this series to be a chapter in my Master’s thesis, so I’m going to leave the heavy-duty explanations and theory for that. Once finished I will probably do a short write up on this blog.

That said, the reason I’m writing this post is that in writing my thesis I figured out a nice way to solve the problem I first wrote about in this post. I won’t restate it in its entirety, but basically we’re looking for a function with the following signature:

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)

Seasoned Haskellers will recognise it as a “traversal”. However, this shouldn’t be an ordinary traversal: that, after all, can be derived automatically by the compiler these days. Instead, the Applicative effects should be evaluated in breadth-first order. To put it another way, if we have a function which lists the elements of a tree in breadth-first order:

bfs :: Tree a -> [a]

Then we should have the following identity:

bft (\x -> ([x], x)) t = (bfs t, t)

Using the writer Applicative with the list monoid here as a way to talk about ordering of effects.

There are many solutions to the puzzle (see Gibbons 2015; or Easterly 2019, or any of the posts in this series), but I had found them mostly unsatisfying. They basically relied on enumerating the tree in breadth-first order, running the traversal on the intermediate list, and then rebuilding the tree. It has the correct time complexity and so on, but it would be nice to deforest the intermediate structure a little bit more.

Anyways, the function I finally managed to get is the following:

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
bft f (x :& xs) = liftA2 (:&) (f x) (bftF f xs)

bftF :: Applicative f => (a -> f b) -> [Tree a] -> f [Tree b]
bftF t = fmap head . foldr (<*>) (pure []) . foldr f [pure ([]:)]
  where
    f (x :& xs) (q : qs) = liftA2 c (t x) q : foldr f (p qs) xs
    
    p []     = [pure ([]:)]
    p (x:xs) = fmap (([]:).) x : xs

    c x k (xs : ks) = ((x :& xs) : y) : ys
      where (y : ys) = k ks

The Tree is defined like so:

data Tree a = a :& [Tree a]

It has all the right properties (complexity, etc.), and if you stick tildes before every irrefutable pattern-match it is also maximally lazy.

As a bonus, here’s another small function I looked at for my thesis. It performs a topological sort of a graph.

type Graph a = a -> [a]

topoSort :: Ord a => Graph a -> [a] -> [a]
topoSort g = fst . foldr f ([], ∅)
  where
    f x (xs,s) 
      | x ∈ s = (xs,s)
      | x ∉ s = first (x:) (foldr f (xs, {x} ∪ s) (g x))

References

Easterly, Noah. 2019. “Functions and newtype wrappers for traversing Trees: Rampion/tree-traversals.” https://github.com/rampion/tree-traversals.

Gibbons, Jeremy. 2015. “Breadth-First Traversal.” Patterns in Functional Programming. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Typing TABA

Donnacha Oisín Kidney — Sat, 15 Feb 2020 00:00:00 UT

Posted on February 15, 2020

Tags: Haskell

Just a short one again today!

There’s an excellent talk by Kenneth Foner at Compose from 2016 which goes through a paper by Danvy and Goldberg (2005) called “There and Back Again” (or TABA). You should watch the talk and read the paper if you’re in any way excited by the weird and wonderful algorithms we use in functional languages to do simple things like reversing a list.

The function focused on in the paper is one which does the following:

zipRev :: [a] -> [b] -> [(a,b)]
zipRev xs ys = zip xs (reverse ys)

But does it in one pass, without reversing the second list. It uses a not-insignificant bit of cleverness to do it, but you can actually arrive at the same solution in a pretty straightforward way by aggressively converting everything you can to a fold. The result is the following:

zipRev :: [a] -> [b] -> [(a,b)]
zipRev xs ys = foldl f b ys xs
  where
    b _ = []
    f k y (x:xs) = (x,y) : k xs

I have written a little more on this function and the general technique before.

The talk goes through the same stuff, but takes a turn then to proving the function total: our version above won’t work correctly if the lists don’t have the same length, so it would be nice to provide that guarantee in the types somehow. Directly translating the version from the TABA paper into one which uses length-indexed vectors will require some nasty, expensive proofs, though, which end up making the whole function quadratic. The solution in the talk is to call out to an external solver which gives some extremely slick proofs (and a very nice interface). However, yesterday I realised you needn’t use a solver at all: you can type the Haskell version just fine, and you don’t even need the fanciest of type-level features.

As ever, the solution is another fold. To demonstrate this rather short solution, we’ll first need the regular toolbox of types:

data Nat = Z | S Nat

data Vec (a :: Type) (n :: Nat) where
  Nil :: Vec a Z
  (:-) :: a -> Vec a n -> Vec a (S n)

And now we will write a length-indexed left fold on this vector. The key trick here is that the type passed in the recursive call changes, by composition:

newtype (:.:) (f :: b -> Type) (g :: a -> b) (x :: a) = Comp { unComp :: f (g x) }

Safe coercions will let us use the above type safely without a performance hit, resulting in the following linear-time function:

foldlVec :: forall a b n. (forall m. a -> b m -> b (S m)) -> b Z -> Vec a n -> b n
foldlVec f b Nil = b
foldlVec f b (x :- xs) = unComp (foldlVec (c f) (Comp (f x b)) xs)
  where
    c :: (a -> b (S m) -> b (S (S m))) -> (a -> (b :.: S) m -> (b :.: S) (S m))
    c = coerce
    {-# INLINE c #-}

We can use this function to write vector reverse:

reverseVec :: Vec a n -> Vec a n
reverseVec = foldlVec (:-) Nil

Now, to write the reversing zip, we need another newtype to put the parameter in the right place, but it is straightforward other than that.

newtype VecCont a b n = VecCont { runVecCont :: Vec a n -> Vec (a,b) n }

revZip :: Vec a n -> Vec b n -> Vec (a,b) n
revZip = flip $ runVecCont . 
  foldlVec
      (\y k -> VecCont (\(x :- xs) -> (x,y) :- runVecCont k xs))
      (VecCont (const Nil))

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” Fundamenta Informaticae 66 (4) (December): 397–413. https://cs.au.dk/~danvy/DSc/08_danvy-goldberg_fi-2005.pdf.

Terminating Tricky Traversals

Donnacha Oisín Kidney — Wed, 29 Jan 2020 00:00:00 UT

Posted on January 29, 2020

Part 8 of a 10-part series on Breadth-First Traversals

Tags: Agda, Haskell

Imports

Lazy Constructive Numbers and the Stern-Brocot Tree

Donnacha Oisín Kidney — Sat, 14 Dec 2019 00:00:00 UT

Posted on December 14, 2019

Tags: Haskell, Agda

In dependently typed languages, it’s often important to figure out a good “low-level” representation for some concept. The natural numbers, for instance:

data Nat = Z | S Nat

For “real” applications, of course, these numbers are offensively inefficient, in terms of both space and time. But that’s not what I’m after here: I’m looking for a type which best describes the essence of the natural numbers, and that can be used to prove and think about them. In that sense, this representation is second to none: it’s basically the simplest possible type which can represent the naturals.

Let’s nail down that idea a little better. What do we mean when a type is a “good” representation for some concept.

There should be no redundancy. The type for the natural numbers above has this property: every natural number as one (and only one) canonical representative in Nat. Compare that to the following possible representation for the integers:
```
data Int = Neg Nat | Pos Nat
```
There are two ways to represent 0 here: as Pos Z or Neg Z.

Of course, you can quotient out the redundancy in Cubical Agda, or normalise on construction every time, but either of these workarounds gets your representation a demerit.
Operations should be definable simply and directly on the representation. Points docked for converting to and from some non-normalised form.
That conversion, however, can exist, and ideally should exist, in some fundamental way. You should be able to establish an efficient isomorphism with other representations of the same concept.
Properties about the type should correspond to intuitive properties about the representation. For Nat above, this means things like order: the usual order on the natural numbers again has a straightforward analogue on Nat.

With that laundry list of requirements, it’s no wonder that it’s often tricky to figure out the “right” type for a concept.

In this post, I’m going to talk about a type for the rational numbers, and I’m going to try satisfy those requirements as best I can.

The Rationals as a Pair of Numbers

Our first attempt at representing the rationals might use a fraction:

data Frac = Integer :/ Integer

This obviously fails the redundancy property. The fractions $\frac{1}{2}$ and $\frac{2}{4}$ represent the same number, but have different underlying values.

So the type isn’t suitable as a potential representation for the rationals. That’s not to say that this type is useless: far from it! Indeed, Haskell’s Data.Ratio uses something quite like this to implement rationals.

If you’re going to deal with redundant elements, there are two broad ways to deal with it. Data.Ratio’s approach is to normalise on construction, and only export a constructor which does this. This gives you a pretty good guarantee that there won’t be any unreduced fractions lying around in you program. Agda’s standard library also uses an approach like this, although the fact that the numerator and denominator are coprime is statically verified by way of a proof carried in the type.

The other way to deal with redundancy is by quotient. In Haskell, that kind of means doing the following:

instance Eq Frac where
  (x :/ xd) == (y :/ yd) = (x * yd) == (y * xd)
  
instance Ord Frac where
  compare (x :/ xd) (y :/ yd) = compare (x * yd) (y * xd)

We don’t have real quotient types in Haskell, but this gets the idea across: we haven’t normalised our representation internally, but as far as anyone using the type is concerned, they shouldn’t be able to tell the difference between $\frac{1}{2}$ and $\frac{2}{4}$ .

Th Num instance is pretty much just a restating of the axioms for fractions.

Num instance for Frac.

A Small Proof that Fin is Injective

Donnacha Oisín Kidney — Fri, 15 Nov 2019 00:00:00 UT

Posted on November 15, 2019

Tags: Agda

Imports etc.

How to do Binary Random-Access Lists Simply

Donnacha Oisín Kidney — Sat, 02 Nov 2019 00:00:00 UT

Posted on November 2, 2019

Part 1 of a 2-part series on Random Access Lists

Tags: Agda

“Heterogeneous Random-Access Lists” by Wouter Swierstra (2019) describes how to write a simple binary random-access list (Okasaki 1995) to use as a heterogeneous tuple. If you haven’t tried to implement the data structure described in the paper before, you might not realise the just how elegant the implementation is. The truth is that arriving at the definitions presented is difficult: behind every simple function is a litany of complex and ugly alternatives that had to be tried and discarded first before settling on the final answer.

In this post I want to go through a very similar structure, with special focus on the “wrong turns” in implementation which can lead to headache.

Two Proofs on ℕ, and How to Avoid Them

Here are a couple of important identities on ℕ:

+0 : ∀ n → n + zero ≡ n
+0 zero    = refl
+0 (suc n) = cong suc (+0 n)

+-suc : ∀ n m → n + suc m ≡ suc n + m
+-suc zero    m = refl
+-suc (suc n) m = cong suc (+-suc n m)

These two show up all the time as proof obligations from the compiler (i.e. “couldn’t match type n + suc m with suc n + m”). The solution is obvious, right? subst in one of the proofs above and you’re on your way. Wait! There might be a better way.

We’re going to look at reversing a vector as an example. We have a normal-looking length-indexed vector:

infixr 5 _∷_
data Vec (A : Set a) : ℕ → Set a where
  [] : Vec A zero
  _∷_ : A → Vec A n → Vec A (suc n)

Reversing a list is easy: we do it the standard way, in $\mathcal{O}(n)$ time, with an accumulator:

list-reverse : List A → List A
list-reverse = go []
  where
  go : List A → List A → List A
  go acc [] = acc
  go acc (x ∷ xs) = go (x ∷ acc) xs

Transferring over to a vector and we see our friends +-suc and +0.

vec-reverse₁ : Vec A n → Vec A n
vec-reverse₁ xs = subst (Vec _) (+0 _) (go [] xs)
  where
  go : Vec A n → Vec A m → Vec A (m + n)
  go acc [] = acc
  go acc (x ∷ xs) = subst (Vec _) (+-suc _ _) (go (x ∷ acc) xs)

The solution, as with so many things, is to use a fold instead of explicit recursion. Folds on vectors are a little more aggressively typed than those on lists:

vec-foldr : (B : ℕ → Type b)
          → (∀ {n} → A → B n → B (suc n))
          → B zero
          → Vec A n
          → B n
vec-foldr B f b [] = b
vec-foldr B f b (x ∷ xs) = f x (vec-foldr B f b xs)

We allow the output type to be indexed by the list of the vector. This is a good thing, bear in mind: we need that extra information to properly type reverse.

For reverse, unfortunately, we need a left-leaning fold, which is a little trickier to implement than vec-foldr.

vec-foldl : (B : ℕ → Set b)
          → (∀ {n} → B n → A → B (suc n))
          → B zero
          → Vec A n
          → B n
vec-foldl B f b [] = b
vec-foldl B f b (x ∷ xs) = vec-foldl (B ∘ suc) f (f b x) xs

With this we can finally reverse.

vec-reverse : Vec A n → Vec A n
vec-reverse = vec-foldl (Vec _) (λ xs x → x ∷ xs) []

The real trick in this function is that the type of the return value changes as we fold. If you think about it, it’s the same optimisation that we make for the $\mathcal{O}(n)$ reverse on lists: the B type above is the “difference list” in types, allowing us to append on to the end without $\mathcal{O}(n^2)$ proofs.

As an aside, this same trick can let us type the convolve-TABA (Danvy and Goldberg 2005; Foner 2016) function quite simply:

convolve : Vec A n → Vec B n → Vec (A × B) n
convolve =
  vec-foldl
    (λ n → Vec _ n → Vec _ n)
    (λ { k x (y ∷ ys) → (x , y) ∷ k ys})
    (λ _ → [])

Binary Numbers

Binary numbers come up a lot in dependently-typed programming languages: they offer an alternative representation of ℕ that’s tolerably efficient (well, depending on who’s doing the tolerating). In contrast to the Peano numbers, though, there are a huge number of ways to implement them.

I’m going to recommend one particular implementation over the others, but before I do I want to define a function on ℕ:

2* : ℕ → ℕ
2* zero = zero
2* (suc n) = suc (suc (2* n))

In all of the implementations of binary numbers we’ll need a function like this. It is absolutely crucial that it is defined in the way above: the other obvious definition (2* n = n + n) is a nightmare for proofs.

Right, now on to some actual binary numbers. The obvious way (a list of bits) is insufficient, as it allows multiple representations of the same number (because of the trailing zeroes). Picking a more clever implementation is tricky, though. One way splits it into two types:

module OneTerminated where
  infixl 5 _0ᵇ _1ᵇ
  infixr 4 𝕓_

  data 𝔹⁺ : Set where
    1ᵇ : 𝔹⁺
    _0ᵇ _1ᵇ : 𝔹⁺ → 𝔹⁺

  data 𝔹 : Set where
    𝕓0ᵇ : 𝔹
    𝕓_ : 𝔹⁺ → 𝔹

𝔹⁺ is the strictly positive natural numbers (i.e. the naturals starting from 1). 𝔹 adds a zero to that set. This removes the possibility for trailing zeroes, thereby making this representation unique for every natural number.

Evaluation is pretty standard

What is Good About Haskell?

Donnacha Oisín Kidney — Wed, 02 Oct 2019 00:00:00 UT

Posted on October 2, 2019

Tags: Haskell

Update 5/10/2019: check the bottom of this post for some links to comments and discussion.

Beginners to Haskell are often confused as to what’s so great about the language. Much of the proselytizing online focuses on pretty abstract (and often poorly defined) concepts like “purity”, “strong types”, and (god forbid) “monads”. These things are difficult to understand, somewhat controversial, and not obviously beneficial (especially when you’ve only been using the language for a short amount of time).

The real tragedy is that Haskell (and other ML-family languages) are packed with simple, decades-old features like pattern matching and algebraic data types which have massive, clear benefits and few (if any) downsides. Some of these ideas are finally filtering in to mainstream languages (like Swift and Rust) where they’re used to great effect, but the vast majority of programmers out there haven’t yet been exposed to them.

This post aims to demonstrate some of these features in a simple (but hopefully not too simple) example. I’m going to write and package up a simple sorting algorithm in both Haskell and Python, and compare the code in each. I’m choosing Python because I like it and beginners like it, but also because it’s missing most of the features I’ll be demonstrating. It’s important to note I’m not comparing Haskell and Python as languages: the Python code is just there as a reference for people less familiar with Haskell. What’s more, the comparison is unfair, as the example deliberately plays to Haskell’s strengths (so I can show off the features I’m interested in): it wouldn’t be difficult to pick an example that makes Python look good and Haskell look poor.

This post is not meant to say “Haskell is great, and your language sucks”! It’s not even really about Haskell: much of what I’m talking about here applies equally well to Ocaml, Rust, etc. I’m really writing this as a response to the notion that functional features are somehow experimental, overly complex, or ultimately compromised. As a result of that idea, I feel like these features are left out of a lot of modern languages which would benefit from them. There exists a small set of simple, battle-tested PL ideas, which have been used for nearly forty years now: this post aims to demonstrate them, and argue for their inclusion in every general-purpose programming language that’s being designed today.

The Algorithm

We’ll be using a skew heap to sort lists in both languages. The basic idea is to repeatedly insert stuff into the heap, and then repeatedly “pop” the smallest element from the heap until it’s empty. It’s not in-place, but it is $\mathcal{O}(n \log n)$ , and actually performs pretty well in practice.

A Tree

A Skew Heap is represented by a binary tree:

data Tree a
  = Leaf
  | Node a (Tree a) (Tree a)

class Tree:
  def __init__(self, is_node, data, lchild, rchild):
    self._is_node = is_node
    self._data = data
    self._lchild = lchild
    self._rchild = rchild
    
def leaf():
  return Tree(False, None, None, None)

def node(data, lchild, rchild):
  return Tree(True, data, lchild, rchild)

I want to point out the precision of the Haskell definition: a tree is either a leaf (an empty tree), or a node, with a payload and two children. There are no special cases, and it took us one line to write (spread to 3 here for legibility on smaller screens).

In Python, we have to write a few more lines¹. This representation uses the _is_node field is False for an empty tree (a leaf). If it’s True, the other fields are filled. We write some helper functions to give us constructors like the leaf and node ones for the Haskell example.

This isn’t the standard definition of a binary tree in Python, in fact it might looks a little weird to most Python people. Let’s run through some alternatives and their issues.

The standard definition:

class Tree:
  def __init__(self, data, lchild, rchild):
    self._data = data
    self._lchild = lchild
    self._rchild = rchild

Instead of having a separate field for “is this a leaf or a node”, the empty tree is simply None:

def leaf():
    return None

With this approach, if we define any methods on a tree, they won’t work on the empty tree!

>>> leaf().size()
AttributeError: 'NoneType' object has no attribute 'size'

We’ll do inheritance! Python even has a handy abc library to help us with some of this:

from abc import ABC, abstractmethod

class Tree(ABC):
    @abstractmethod
    def size(self):
        raise NotImplemented

class Leaf(Tree):
    def __init__(self):
        pass

    def size(self):
        return 0

class Node(Tree):
    def __init__(self, data, lchild, rchild):
        self._data = data
        self._lchild = lchild
        self._rchild = rchild

    def size(self):
        return 1 + self._lchild.size() + self._rchild.size()

Methods will now work on an empty tree, but we’re faced with 2 problems: first, this is very verbose, and pretty complex. Secondly, we can’t write a mutating method which changes a tree from a leaf to a node. In other words, we can’t write an insert method!

We won’t represent a leaf as the whole tree being None, just the data!
```
def leaf():
    return Tree(None, None, None)
```
This (surprisingly) pops up in a few places. While it solves the problem of methods, and the mutation problem, it has a serious bug. We can’t have None as an element in the tree! In other words, if we ask our eventual algorithm to sort a list which contains None, it will silently discard some of the list, returning the wrong answer.

There are yet more options (using a wrapper class), none of them ideal. Another thing to point out is that, even with our definition with a tag, we can only represent types with 2 possible states. If there was another type of node in the tree, we couldn’t simply use a boolean tag: we’d have to switch to integers (and remember the meaning of each integer), or strings! Yuck!

What Python is fundamentally missing here is algebraic data types. This is a way of building up all of your types out of products (“my type has this and this”) and sums (“my type is this or this”). Python can do products perfectly well: that’s what classes are. The tree class itself is the product of Bool, data, Tree, and Tree. However it’s missing an entire half of the equation! This is why you just can’t express binary trees as cleanly as you can in Swift, Haskell, OCaml, etc. Python, as well as a host of other languages like Go, Java, etc, will let you express one kind of “sum” type: “or nothing” (the null pointer). However, it’s clunky and poorly handled in all of those languages (the method problems above demonstrate the issues in Python), and doesn’t work for anything other than that one special case.

Again, there’s nothing about algebraic data types that makes them ill-suited to mainstream or imperative languages. Swift uses them, and people love them!

A Function

The core operation on skew heaps is the skew merge.

merge :: Ord a => Tree a -> Tree a -> Tree a
merge Leaf ys = ys
merge xs Leaf = xs
merge xs@(Node x xl xr) ys@(Node y yl yr)
   | x <= y    = Node x (merge ys xr) xl
   | otherwise = Node y (merge xs yr) yl

def merge(lhs, rhs):
  if not lhs._is_node:
    return rhs
  if not rhs._is_node:
    return lhs
  if lhs._data <= rhs._data:
    return Tree(lhs._data,
                merge(rhs, lhs._rchild),
                lhs._lchild)
  else:
    return Tree(rhs._data,
                merge(lhs, rhs._rchild),
                rhs._lchild)

The standout feature here is pattern matching. In Haskell, we’re able to write the function as we might describe it: “in this case, I’ll do this, in this other case, I’ll do this, etc.”. In Python, we are forced to think of the truth tables and sequential testing. What do I mean by truth tables? Consider the following version of the Python function above:

def merge(lhs, rhs):
  if lhs._is_node:
    if rhs._is_node:
      if lhs._data <= rhs._data:
        return Tree(lhs._data,
                    merge(rhs, lhs._rchild),
                    lhs._lchild)
      else:
        return Tree(rhs._data,
                    merge(lhs, rhs._rchild),
                    rhs._lchild)
    else:
      return lhs
  else:
    return rhs

You may even write this version first: it initially seems more natural (because _is_node is used in the positive). Here’s the question, though: does it do the same thing as the previous version? Are you sure? Which else is connected to which if? Does every if have an else? (some linters will suggest you remove some of the elses above, since the if-clause has a return statement in it!)

The fact of the matter is that we are forced to do truth tables of every condition in our minds, rather than saying what we mean (as we do in the Haskell version).

The other thing we’re saved from in the Haskell version is accessing undefined fields. In the Python function, we know accessing lhs._data is correct since we verified that lhs is a node. But the logic to do this verification is complex: we checked if it wasn’t a node, and returned if that was true… so if it is true that lhs isn’t a node, we would have returned, but we didn’t, so…

Bear in mind all of these logic checks happened four lines before the actual access: this can get much uglier in practice! Compare this to the Haskell version: we only get to bind variables if we’re sure they exist. The syntax itself prevents us from accessing fields which aren’t defined, in a simple way.

Pattern matching has existed for years in many different forms: even C has switch statements. The added feature of destructuring is available in languages like Swift, Rust, and the whole ML family. Ask for it in your language today!

Now that we have that function, we get to define others in terms of it:

insert :: Ord a => a -> Tree a -> Tree a
insert x = merge (Node x Leaf Leaf)

def insert(element, tree):
    tree.__dict__ = merge(
        node(element, leaf(), leaf()),
        tree
    ).__dict__.copy()

A Word on Types

I haven’t mentioned Haskell’s type system so far, as it’s been quite unobtrusive in the examples. And that’s kind of the point: despite more complex examples you’ll see online demonstrating the power of type classes and higher-kinded types, Haskell’s type system excels in these simpler cases.

merge :: Ord a => Tree a -> Tree a -> Tree a

Without much ceremony, this signature tells us:

The function takes two trees, and returns a third.
Both trees have to be filled with the same types of elements.
Those elements must have an order defined on them.

Type Inference

I feel a lot of people miss the point of this particular feature. Technically speaking, this feature allows us to write fewer type signatures, as Haskell will be able to guess most of them. Coming from something like Java, you might think that that’s an opportunity to shorten up some verbose code. It’s not! You’ll rarely find a Haskell program these days missing top-level type signatures: it’s easier to read a program with explicit type signatures, so people are advised to put them as much as possible.

(Amusingly, I often find older Haskell code snippets which are entirely devoid of type signatures. It seems that programmers were so excited about Hindley-Milner type inference that they would put it to the test as often as they could.)

Type inference in Haskell is actually useful in a different way. First, if I write the implementation of the merge function, the compiler will tell me the signature, which is extremely helpful for more complex examples. Take the following, for instance:

f x = ((x * 2) ^ 3) / 4

Remembering precisely which numeric type x needs to be is a little difficult (Floating? Real? Fractional?), but if I just ask the compiler it will tell me without difficulty.

The second use is kind of the opposite: if I have a hole in my program where I need to fill in some code, Haskell can help me along by telling me the type of that hole automatically. This is often enough information to figure out the entire implementation! In fact, there are some programs which will use this capability of the type checker to fill in the hole with valid programs, synthesising your code for you.

So often strong type systems can make you feel like you’re fighting more and more against the compiler. I hope these couple examples show that it doesn’t have to be that way.

When Things Go Wrong

The next function is “pop-min”:

popMin :: Ord a => Tree a -> Maybe (a, Tree a)
popMin Leaf = Nothing
popMin (Node x xl xr) = Just (x, merge xl xr)

def popMin(tree):
  if tree._is_node:
    res = tree._data
    tree.__dict__ = merge(
        tree._lchild, 
        tree._rchild
    ).__dict__.copy()
    return res
  else:
    raise IndexError

At first glance, this function should be right at home in Python. It mutates its input, and it has an error case. The code we’ve written here for Python is pretty idiomatic, also: other than the ugly deep copy, we’re basically just mutating the object, and using an exception for the exceptional state (when the tree is empty). Even the exception we use is the same exception as when you try and pop() from an empty list.

The Haskell code here mainly demonstrates a difference in API style you’ll see between the two languages. If something isn’t found, we just use Maybe. And instead of mutating the original variable, we return the new state in the second part of a tuple. What’s nice about this is that we’re only using simple core features like algebraic data types to emulate pretty complex features like exceptions in Python.

You may have heard that “Haskell uses monads to do mutation and exceptions”. This is not true. Yes, state and exceptions have patterns which technically speaking are “monadic”. But make no mistake: when we want to model “exceptions” in Haskell, we really just return a maybe (or an either). And when we want to do “mutation”, we return a tuple, where the second element is the updated state. You don’t have to understand monads to use them, and you certainly don’t “need” monads to do them. To drive the point home, the above code could actually equivalently have a type which mentions “the state monad” and “the maybe monad”:

popMin :: Ord a => StateT (Tree a) Maybe a

But there’s no need to!

Gluing It All Together

The main part of our task is now done: all that is left is to glue the various bits and pieces together. Remember, the overall algorithm builds up the heap from a list, and then tears it down using popMin. First, then, to build up the heap.

listToHeap :: Ord a => [a] -> Tree a
listToHeap = foldr insert Leaf

def listToHeap(elements):
  res = leaf()
  for el in elements:
    insert(el, res)
  return res

To my eye, the Haskell code here is significantly more “readable” than the Python. I know that’s a very subjective judgement, but foldr is a function so often used that it’s immediately clear what’s happening in this example.

Why didn’t we use a similar function in Python, then? We actually could have: python does have an equivalent to foldr, called reduce (it’s been relegated to functools since Python 3 (also technically it’s equivalent to foldl, not foldr)). We’re encouraged not to use it, though: the more pythonic code uses a for loop. Also, it wouldn’t work for our use case: the insert function we wrote is mutating, which doesn’t gel well with reduce.

I think this demonstrates another benefit of simple, functional APIs. If you keep things simple, and build things out of functions, they’ll tend to glue together well, without having to write any glue code yourself. The for loop, in my opinion, is “glue code”. The next function, heapToList, illustrates this even more so:

heapToList :: Ord a => Tree a -> [a]
heapToList = unfoldr popMin

def heapToList(tree):
  res = []
  try:
    while True:
      res.append(popMin(tree))
  except IndexError:
    return res

Again, things are kept simple in the Haskell example. We’ve stuck to data types and functions, and these data types and functions mesh well with each other. You might be aware that there’s some deep and interesting mathematics behind the foldr and unfoldr functions going on, and how they relate. We don’t need to know any of that here, though: they just work together well.

Again, Python does have a function which is equivalent to unfoldr: iter has an overload which will repeatedly call a function until it hits a sentinel value. But this doesn’t fit with the rest of the iterator model! Most iterators are terminated with the StopIteration exception; ours (like the pop function on lists) is terminated by the IndexError exception; and this function excepts a third version, terminated by a sentinel!

Finally, let’s write sort:

sort :: Ord a => [a] -> [a]
sort = heapToList . listToHeap

def sort(elements):
  return heapToList(listToHeap(elements))

This is just driving home the point: programs work well when they’re built out of functions, and you want your language to encourage you to build things out of functions. In this case, the sort function is built out of two smaller ones: it’s the essence of function composition.

Laziness

So I fully admit that laziness is one of the features of Haskell that does have downsides. I don’t think every language should be lazy, but I did want to say a little about it in regards to the sorting example here.

I tend to think that people overstate how hard it makes reasoning about space: it actually follows pretty straightforward rules, which you can generally step through in yourself (compared to, for instance, rewrite rules, which are often black magic!)

In modern programming, people will tend to use laziness it anyway. Python is a great example: the itertools library is almost entirely lazy. Actually making use of the laziness, though, is difficult and error-prone. Above, for instance, the heapToList function is lazy in Haskell, but strict in Python. Converting it to a lazy version is not the most difficult thing in the world:

def heapToList(tree):
  try:
    while True:
      yield popMin(tree)
  except IndexError:
    pass

But now, suddenly, the entire list API won’t work. What’s more, if we try and access the first element of the returned value, we mutate the whole thing: anyone else looking at the output of the generator will have it mutated out from under them!

Laziness fundamentally makes this more reusable. Take our popMin function: if we just want to view the smallest element, without reconstructing the rest of the tree, we can actually use popMin as-is. If we don’t use the second element of the tuple we don’t pay for it. In Python, we need to write a second function.

Testing

Testing the sort function in Haskell is ridiculously easy. Say we have an example sorting function that we trust, maybe a slow but obvious insertion sort, and we want to make sure that our fast heap sort here does the same thing. This is the test:

quickCheck (\xs -> sort (xs :: [Int]) === insertionSort xs)

In that single line, the QuickCheck library will automatically generate random input, run each sort function on it, and compare the two outputs, giving a rich diff if they don’t match.

Conclusion

This post was meant to show a few features like pattern-matching, algebraic data types, and function-based APIs in a good light. These ideas aren’t revolutionary any more, and plenty of languages have them, but unfortunately several languages don’t. Hopefully the example here illustrates a little why these features are good, and pushes back against the idea that algebraic data types are too complex for mainstream languages.

Update 5/10/2019

This got posted to /r/haskell and hackernews. You can find me arguing in the comments there a little bit: I’m oisdk on hackernews and u/foBrowsing on reddit.

There are two topics that came up a bunch that I’d like to add to this post. First I’ll just quote one of the comments from Beltiras:

Friend of mine is always trying to convert me. Asked me to read this yesterday evening. This is my take on the article:

Most of my daily job goes into gluing services (API endpoints to databases or other services, some business logic in the middle). I don’t need to see yet another exposition of how to do algorithmic tasks. Haven’t seen one of those since doing my BSc. Show me the tools available to write a daemon, an http server, API endpoints, ORM-type things and you will have provided me with tools to tackle what I do. I’ll never write a binary tree or search or a linked list at work.

If you want to convince me, show me what I need to know to do what I do.

and my response:

I wasn’t really trying to convince anyone to use Haskell at their day job: I am just a college student, after all, so I would have no idea what I was talking about!

I wrote the article a while ago after being frustrated using a bunch of Go and Python at an internship. Often I really wanted simple algebraic data types and pattern-matching, but when I looked up why Go didn’t have them I saw a lot of justifications that amounted to “functional features are too complex and we’re making a simple language. Haskell is notoriously complex”. In my opinion, the res, err := fun(); if err != nil (for example) pattern was much more complex than the alternative with pattern-matching. So I wanted to write an article demonstrating that, while Haskell has a lot of out-there stuff in it, there’s a bunch of simple ideas which really shouldn’t be missing from any modern general-purpose language.

As to why I used a binary tree as the example, I thought it was pretty self-contained, and I find skew heaps quite interesting.

The second topic was basically people having a go at my ugly Python; to which I say: fair enough! It is not my best. I wasn’t trying necessarily to write the best Python I could here, though, rather I was trying to write the “normal” implementation of a binary tree. If I was to implement a binary tree of some sort myself, though, I would certainly write it in an immutable style rather than the style here. Bear in mind as well that much of what I’m arguing for is stylistic: I think (for instance) that it would be better to use reduce in Python more, and I think the move away from it is a bad thing. So of course I’m not going to use reduce when I’m showing the Python version: I’m doing a compare and contrast!

Yes, I know about the new dataclasses feature. However, it’s wrapped up with the (also new) type hints module, and as such is much more complicated to use. As the purpose of the Python code here is to provide something of a lingua franca for non-Haskellers, I decided against using it. That said, the problems outlined are not solved by dataclasses.↩︎

Bachelor's Thesis

Donnacha Oisín Kidney — Sun, 14 Jul 2019 00:00:00 UT

Posted on July 14, 2019

Tags: Agda

I recently finished my undergrad degree in UCC. I’m putting my final-year project up here for reference purposes.

Here is the pdf.

And here’s a bibtext entry:

@thesis{kidney_automatically_2019,
	address = {Cork, Ireland},
	type = {Bachelor thesis},
	title = {Automatically and {Efficiently} {Illustrating} {Polynomial} {Equalities} in {Agda}},
	url = {https://doisinkidney.com/pdfs/bsc-thesis.pdf},
	abstract = {We present a new library which automates the construction of equivalence proofs between polynomials over commutative rings and semirings in the programming language Agda [20]. It is signi cantly faster than Agda’s existing solver. We use re ection to provide a sim- ple interface to the solver, and demonstrate how to use the constructed proofs to provide step-by-step solutions.},
	language = {en},
	school = {University College Cork},
	author = {Kidney, Donnacha Oisín},
	month = apr,
	year = {2019}
}

Solving Programming Puzzles without using your Brain

Donnacha Oisín Kidney — Tue, 04 Jun 2019 00:00:00 UT

Posted on June 4, 2019

Tags: Python

This post is a write-up of a solution to part of a programming puzzle I did yesterday. It’s a little different than the usual “solution + theory” approach, though: I’m going to talk about the actual steps you’d need to take to get to the solution (i.e. what to google, what intermediate code looks like, etc.). Often write ups like this are presented as finished artefacts, with little info on the tricks or techniques the author used to jog their intuition into figuring out the puzzle (or where some intermediate step requires a leap of insight). In actual fact, this particular puzzle requires almost no insight at all: I’m going to show how to get to a working solution without understanding any of the theory behind it!

Spoilers ahead for the google foobar problem “Distract the Guards”.

The Problem

We’re interested in a particular type of sequences of pairs of numbers. These sequences are generated from a starting pair $n$ and $m$ like so:

If $n$ and $m$ are equal, the sequence stops.

Otherwise, the smaller number is subtracted from the larger, and then the smaller is doubled, and the sequence continues with these two numbers.

Here’s an example starting with 3 and 5:

     3, 5
     6, 2
     4, 4
---- done ----

Once it hits 4, 4, the first condition is met, and the sequence stops. Not all of these sequences stop, however:

     1, 4
     2, 3
     1, 4
---- done ----

As you can see, in this case we loop back around to 1, 4: our task is to figure out, given a pair of numbers, whether the sequence generated by them loops forever, or stops at some point.

Step 1: Write a Dumb Solution

This step is crucial: before trying to figure out any of the deep mathematics behind the problem, write the dumbest thing that could work. You’re going to need it, anyway, to test your faster versions against, and besides, it might be good enough as-is!

def sequence(n,m):
    while n != m:
        yield (n,m)
        if n < m:
            m -= n
            n *= 2
        else:
            n -= m
            m *= 2

def loops(xs):
    seen = set()
    for x in xs:
        if x in seen:
            return True
        else:
            seen.add(x)
    return False

def solution(n,m):
    return loops(sequence(n,m))

The first function actually generates the sequence we’re interested in: it uses python’s generators to do so. The second function is just a generic function that checks a sequence for duplicates. Finally, the last function answers the question we’re interested in.

Step 2: Graph it

Next, we want to try and spot some patterns in the answers the function generates. Remember, we’re not really interested in figuring out the theory at this point: if we find out that a loop only happens when both numbers are even (for instance), that’s good enough for us and we can stop there!

We humans are pattern-matching machines: to leverage our abilities, though, we will need to visualise the data somehow. In this case, I’m going to plot a simple scatter graph to the terminal, using the following code (I apologise for my terrible indentation style):

print(
    '\n'.join(
        ''.join(
            '*' if solution(x,y) else ' '
            for x in range(1,81)
        )
        for y in range(100,0,-1)
    )
)

And we get the following output:

Output

Deriving a Linear-Time Applicative Traversal of a Rose Tree

Donnacha Oisín Kidney — Tue, 28 May 2019 00:00:00 UT

Posted on May 28, 2019

Part 7 of a 10-part series on Breadth-First Traversals

Tags: Haskell

The Story so Far

Currently, we have several different ways to enumerate a tree in breadth-first order. The typical solution (which is the usual recommended approach in imperative programming as well) uses a queue, as described by Okasaki (2000). If we take the simplest possible queue (a list), we get a quadratic-time algorithm, with an albeit simple implementation. The next simplest version is to use a banker’s queue (which is just a pair of lists). From this version, if we inline and apply identities like the following:

foldr f b . reverse = foldl (flip f) b

We’ll get to the following definition:

bfe :: Forest a -> [a]
bfe ts = foldr f b ts []
  where
    f (Node x xs) fw bw = x : fw (xs : bw)

    b [] = []
    b qs = foldl (foldr f) b qs []

We can get from this function to others (like one which uses a corecursive queue, and so on) through a similar derivation. I might some day write a post on each derivation, starting from the simple version and demonstrating how to get to the more efficient at each step.

For today, though, I’m interested in the traversal of a rose tree. Traversal, here, of course, is in the applicative sense.

Thus far, I’ve managed to write linear-time traversals, but they’ve been unsatisfying. They work by enumerating the tree, traversing the effectful function over the list, and then rebuilding the tree. Since each of those steps only takes linear time, the whole thing is indeed a linear-time traversal, but I hadn’t been able to fuse away the intermediate step.

Phases

The template for the algorithm I want comes from the Phases applicative (Easterly 2019):

data Phases f a where
  Lift   :: f a -> Phases f a
  (:<*>) :: f (a -> b) -> Phases f a -> Phases f b

We can use it to write a breadth-first traversal like so:

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
bft f = runPhases . go
  where
    go (Node x xs) = liftA2 Node (Lift (f x)) (later (traverse go xs))

The key component that makes this work is that it combines applicative effects in parallel:

instance Functor f => Functor (Phases f) where
    fmap f (Lift x) = Lift (fmap f x)
    fmap f (fs :<*> xs) = fmap (f.) fs :<*> xs
    
instance Applicative f => Applicative (Phases f) where
    pure = Lift . pure
    Lift fs      <*> Lift xs      = Lift (fs <*> xs)
    (fs :<*> gs) <*> Lift xs      = liftA2 flip fs xs :<*> gs
    Lift fs      <*> (xs :<*> ys) = liftA2 (.)  fs xs :<*> ys
    (fs :<*> gs) <*> (xs :<*> ys) = liftA2 c    fs xs :<*> liftA2 (,) gs ys
      where
        c f g ~(x,y) = f x (g y)

We’re also using the following helper functions:

runPhases :: Applicative f => Phases f a -> f a
runPhases (Lift x) = x
runPhases (fs :<*> xs) = fs <*> runPhases xs

later :: Applicative f => Phases f a -> Phases f a
later = (:<*>) (pure id)

The problem is that it’s quadratic: the traverse in:

go (Node x xs) = liftA2 Node (Lift (f x)) (later (traverse go xs))

Hides some expensive calls to <*>.

A Roadmap for Optimisation

The problem with the Phases traversal is actually analogous to another function for enumeration: levels from Gibbons (2015).

levels :: Tree a -> [[a]]
levels (Node x xs) = [x] : foldr lzw [] (map levels xs)
  where
    lzw [] ys = ys
    lzw xs [] = xs
    lzw (x:xs) (y:ys) = (x ++ y) : lzw xs ys

lzw takes the place of <*> here, but the overall issue is the same: we’re zipping at every point, making the whole thing quadratic.

However, from the above function we can derive a linear time enumeration. It looks like this:

levels :: Tree a -> [[a]]
levels ts = f ts []
  where
    f (Node x xs) (q:qs) = (x:q) : foldr f qs xs
    f (Node x xs) []     = [x]   : foldr f [] xs

Our objective is clear, then: try to derive the linear-time implementation of bft from the quadratic, in a way analogous to the above two functions. This is actually relatively straightforward once the target is clear: the rest of this post is devoted to the derivation.

Derivation

First, we start off with the original bft.

bft :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)
bft f = runPhases . go
  where
    go (Node x xs) = liftA2 Node (Lift (f x)) (later (traverse go xs))

Inline traverse.

Implicit Corecursive Queues

Donnacha Oisín Kidney — Tue, 14 May 2019 00:00:00 UT

Posted on May 14, 2019

Part 6 of a 10-part series on Breadth-First Traversals

Tags: Haskell

Fusion

I was looking again at one of my implementations of breadth-first traversals:

bfe :: Tree a -> [a]
bfe r = f r b []
  where
    f (Node x xs) fw bw = x : fw (xs : bw)
  
    b [] = []
    b qs = foldl (foldr f) b qs []

And I was wondering if I could fuse away the intermediate list. On the following line:

f (Node x xs) fw bw = x : fw (xs : bw)

The xs : bw is a little annoying, because we know it’s going to be consumed eventually by a fold. When that happens, it’s often a good idea to remove the list, and just inline the fold. In other words, if you see the following:

foldr f b (x : y : [])

You should replace it with this:

f x (f y b)

If you try and do that with the above definition, you get something like the following:

bfenum :: Tree a -> [a]
bfenum t = f t b b
  where
    f (Node x xs) fw bw = x : fw (bw . flip (foldr f) xs)
    b x = x b

Infinite Types

The trouble is that the above comes with type errors:

Cannot construct the infinite type: b ~ (b -> c) -> [a]

This error shows up occasionally when you try and do heavy church-encoding in Haskell. You get a similar error when trying to encode the Y combinator:

y = \f -> (\x -> f (x x)) (\x -> f (x x))

• Occurs check: cannot construct the infinite type: t0 ~ t0 -> t

The solution for the y combinator is to use a newtype, where we can catch the recursion at a certain point to help the typechecker.

newtype Mu a = Mu (Mu a -> a)
y f = (\h -> h $ Mu h) (\x -> f . (\(Mu g) -> g) x $ x)

The trick for our queue is similar:

newtype Q a = Q { q :: (Q a -> [a]) -> [a] }

bfenum :: Tree a -> [a]
bfenum t = q (f t b) e
  where
    f (Node x xs) fw = Q (\bw -> x : q fw (bw . flip (foldr f) xs))
    b = fix (Q . flip id)
    e = fix (flip q)

This is actually equivalent to the continuation monad:

newtype Fix f = Fix { unFix :: f (Fix f) }

type Q a = Fix (ContT a [])

q = runContT . unFix

bfenum :: Tree a -> [a]
bfenum t = q (f t b) e
  where
    f (Node x xs) fw = Fix (mapContT (x:) (flip (foldr f) xs <$> unFix fw))
    b = fix (Fix . pure)
    e = fix (flip q)

Terminating

There’s a problem though: this algorithm never checks for an end. That’s ok if there isn’t one, mind you. For instance, with the following “unfold” function:

infixr 9 #.
(#.) :: Coercible b c => (b -> c) -> (a -> b) -> a -> c
(#.) _ = coerce
{-# INLINE (#.) #-}

bfUnfold :: (a -> (b,[a])) -> a -> [b]
bfUnfold f t = g t (fix (Q #. flip id)) (fix (flip q))
  where
    g b fw bw = x : q fw (bw . flip (foldr ((Q .) #. g)) xs)
      where
        (x,xs) = f b

We can write a decent enumeration of the rationals.

-- Stern-Brocot
rats1 :: [Rational]
rats1 = bfUnfold step ((0,1),(1,0))
  where
    step (lb,rb) = (n % d,[(lb , m),(m , rb)])
      where
        m@(n,d) = adj lb rb
    adj (w,x) (y,z) = (w+y,x+z)
    
-- Calkin-Wilf
rats2 :: [Rational]
rats2 = bfUnfold step (1,1)
  where
    step (m,n) = (m % n,[(m,m+n),(n+m,n)])

However, if we do want to stop at some point, we need a slight change to the queue type.

newtype Q a = Q { q :: Maybe (Q a -> [a]) -> [a] }

bfenum :: Tree a -> [a]
bfenum t = q (f t b) e
  where 
    f (Node x xs) fw = Q (\bw -> x : q fw (Just (m bw . flip (foldr f) xs)))
    b = fix (Q . maybe [] . flip ($))
    e = Nothing
    m = fromMaybe (flip q e)

Monadic

We can actually add in a monad to the above unfold without much difficulty.

newtype Q m a = Q { q :: Maybe (Q m a -> m [a]) -> m [a] }

bfUnfold :: Monad m => (a -> m (b,[a])) -> a -> m [b]
bfUnfold f t = g t b e
  where
    g s fw bw = f s >>= 
       \ ~(x,xs) -> (x :) <$>  q fw (Just (m bw . flip (foldr ((Q .) #. g)) xs))
        
    b = fix (Q #. maybe (pure []) . flip ($))
    e = Nothing
    m = fromMaybe (flip q e)

And it passes the torture tests for a linear-time breadth-first unfold from Feuer (2015). It breaks when you try and use it to build a tree, though.

Phases

Finally, we can try and make the above code a little more modular, by actually packaging up the queue type as a queue.

newtype Q a = Q { q :: Maybe (Q a -> [a]) -> [a] }
newtype Queue a = Queue { runQueue :: Q a -> Q a }

now :: a -> Queue a
now x = Queue (\fw -> Q (\bw -> x : q fw bw))
    
delay :: Queue a -> Queue a
delay xs = Queue (\fw -> Q (\bw -> q fw (Just (m bw . runQueue xs))))
  where
    m = fromMaybe (flip q Nothing)
    
instance Monoid (Queue a) where
    mempty = Queue id
    mappend (Queue xs) (Queue ys) = Queue (xs . ys)
    
run :: Queue a -> [a]
run (Queue xs) = q (xs b) Nothing
  where
    b = fix (Q . maybe [] . flip ($))

bfenum :: Tree a -> [a]
bfenum t = run (f t)
  where 
    f (Node x xs) = now x <> delay (foldMap f xs)

At this point, our type is starting to look a lot like the Phases type from Noah Easterly’s tree-traversals package. This is exciting: the Phases type has the ideal interface for level-wise traversals. Unfortunately, it has the wrong time complexity for <*> and so on: my suspicion is that the queue type above here is to Phases as the continuation monad is to the free monad. In other words, we’ll get efficient construction at the expense of no inspection. Unfortunately, I can’t figure out how to turn the above type into an applicative. Maybe in a future post!

Finally, a lot of this is working towards finally understanding Smith (2009) and Allison (2006).

Allison, Lloyd. 2006. “Circular Programs and Self-Referential Structures.” Software: Practice and Experience 19 (2) (October): 99–109. doi:10.1002/spe.4380190202. http://users.monash.edu/~lloyd/tildeFP/1989SPE/.

Feuer, David. 2015. “Is a lazy, breadth-first monadic rose tree unfold possible?” Question. Stack Overflow. https://stackoverflow.com/q/27748526.

Smith, Leon P. 2009. “Lloyd Allison’s Corecursive Queues: Why Continuations Matter.” The Monad.Reader 14 (14) (July): 28. https://meldingmonads.files.wordpress.com/2009/06/corecqueues.pdf.

Concatenative Programming; The Free Monoid of Programming Languages

Donnacha Oisín Kidney — Sat, 11 May 2019 00:00:00 UT

Posted on May 11, 2019

Tags: Concatenative, Haskell

This post demonstrates a simple encoding of a (typed) concatenative language in Haskell.

Point-free style is one of the distinctive markers of functional programming languages. Want to sum a list? That’s as easy as:

sum = foldr (+) 0

Now I want to sum every number after adding one to it.

sumSuccs = foldr (+) 0 . map ((+) 1)

One more step to make this function truly abstract™ and general™: we’ll allow the user to supply their own number to add

sumAdded = foldr (+) 0 . map . (+)

And here the trouble begins. The above expression won’t actually type check. In fact, it’ll give a pretty terrible error message:

• Non type-variable argument in the constraint: Num [a]
  (Use FlexibleContexts to permit this)
• When checking the inferred type
    sumThoseThat :: forall a.
                    (Num [a], Foldable ((->) [a])) =>
                    a -> [a]

I remember as a beginner being confused by similar messages. What’s FlexibleContexts? I had thought that the “point-free style” just meant removing the last variable from an expression if it’s also the last argument:

sum xs = foldr (+) 0 xs
sum = foldr (+) 0

Why doesn’t it work here?

Well, it doesn’t work because the types don’t line up, but I’m going to try and explain a slightly different perspective on the problem, which is associativity.

To make it a little clearer, let’s see what happens when we point-fill the expression:

sumAdded n xs = (foldr(+) 0 . (map . (+))) n xs
             => foldr(+) 0 ((map . (+)) n) xs
             => foldr(+) 0 (map ((+) n)) xs

Indeed, the problem is the placement of the parentheses. What we want at the end is:

             => foldr(+) 0 (map ((+) n) xs)

But, no matter. We have to jiggle the arguments around, or we could use something terrible like this:

infixr 9 .:
(.:) = (.).(.)

sumAdded = foldr (+) 0 .: map . (+)

Is there something, though, that could do this automatically?

Associativity

We run into a similar problem in Agda. We’re forever having to prove statements like:

(x + y) + z ≡ x + (y + z)
x ≡ x + 0

There are a couple of ways to get around the issue, and for monoids there’s a rich theory of techniques. I’ll just show one for now, which relies on the endomorphism monoid. This monoid is created by partially applying the monoid’s binary operator:

Endo : Set
Endo = ℕ → ℕ

⟦_⇑⟧ : ℕ → Endo
⟦ n ⇑⟧ m = n + m

And you can get back to the underlying monoid by applying it to the neutral element:

⟦_⇓⟧ : Endo → ℕ
⟦ n ⇓⟧ = n 0

Here’s the important parts: first, we can lift the underlying operation into the endomorphism:

_⊕_ : Endo → Endo → Endo
xs ⊕ ys = λ x → xs (ys x)

⊕-homo : ∀ n m → ⟦ ⟦ n ⇑⟧ ⊕ ⟦ m ⇑⟧ ⇓⟧ ≡ n + m
⊕-homo n m = cong (n +_) (+-identityʳ m)

And second, it’s definitionally associative.

⊕-assoc : ∀ x y z → (x ⊕ y) ⊕ z ≡ x ⊕ (y ⊕ z)
⊕-assoc _ _ _ = refl

These are all clues as to how to solve the composition problem in the Haskell code above. We need definitional associativity, somehow. Maybe we can get it from the endomorphism monoid?

State

You’re probably familiar with Haskell’s state monad:

newtype State s a = State { runState :: s -> (a, s) }

It can help a lot when you’re threading around fiddly accumulators and so on.

nub :: Ord a => [a] -> [a]
nub = go Set.empty
  where
    go seen [] = []
    go seen (x:xs)
      | x `Set.member` seen = go seen xs
      | otherwise = x : go (Set.insert x seen) xs

nub :: Ord a => [a] -> [a]
nub = flip evalState Set.empty . go
  where
    go [] = pure []
    go (x:xs) = do
        seen <- gets (Set.member x)
        if seen
          then go xs
          else do
              modify (Set.insert x)
              (x:) <$> go xs

Of course, these days state is a transformer:

newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }

This lets us stack multiple effects on top of each other: error handling, IO, randomness, even another state monad. In fact, if you do stack another state monad on top, you might be surprised by the efficiency of the code it generates:

type DoubleState s1 s2 a = StateT s1 (State s2) a
                        => s1 -> State s2 (a, s1)
                        => s1 -> s2 -> ((a, s1), s2)

It’s nothing earth shattering, but it inlines and optimises well. That output is effectively a left-nested list, also.

Multi-State

If we can do one, and we can do two, why not more? Can we generalise the state pattern to an arbitrary number of variables? First we’ll need a generic tuple:

infixr 5 :-
data Stack (xs :: [Type]) :: Type where
    Nil  :: Stack '[]
    (:-) :: x -> Stack xs -> Stack (x : xs)

Then, the state type.

newtype State xs a = State { runState :: Stack xs -> (a, Stack xs) }

We can actually clean the definition up a little: instead of a tuple at the other end, why not push it onto the stack.

newtype State xs a = State { runState :: Stack xs -> Stack (a : xs) }

In fact, let’s make this as polymorphic as possible. We should be able to change the state is we so desire.

infixr 0 :->
type (:->) xs ys = Stack xs -> Stack ys

And suddenly, our endomorphism type from above shows up again.

We can, of course, get back our original types.

newtype State xs a = State { runState :: xs :-> a : xs }

And it comes with all of the instances you might expect:

instance Functor (State xs) where
    fmap f xs = State (\s -> case runState xs s of
        (x :- ys) -> f x :- ys)
        
instance Applicative (State xs) where
    pure x = State (x :-)
    fs <*> xs = State (\s -> case runState fs s of
        (f :- s') -> case runState xs s' of
            (x :- s'') -> f x :- s'')
            
instance Monad (State xs) where
    xs >>= f = State (\s -> case runState xs s of
        y :- ys -> runState (f y) ys)

Polymorphism

But what’s the point? So far we’ve basically just encoded an unnecessarily complicated state transformer. Think back to the stacking of states. Written in the mtl style, the main advantage of stacking monads like that is you can write code like the following:

pop :: (MonadState [a] m, MonadError String m) => m a
pop = get >>= \case
    [] -> throwError "pop: empty list"
    x:xs -> do
        put xs 
        pure x

In other words, we don’t care about the rest of m, we just care that it has, somewhere, state for an [a].

This logic should apply to our stack transformer, as well. If it only cares about the top two variables, it shouldn’t care what the rest of the list is. In types:

infixr 0 :->
type (:->) xs ys = forall zs. Stack (xs ++ zs) -> Stack (ys ++ zs)

And straight away we can write some of the standard combinators:

dup :: '[a] :-> '[a,a]
dup (x :- xs) = (x :- x :- xs)

swap :: '[x,y] :-> '[y,x]
swap (x :- y :- xs) = y :- x :- xs

drop :: '[x,y] :-> '[y]
drop (_ :- xs) = xs

infixl 9 !
(f ! g) x = g (f x)

You’ll immediately run into trouble if you try to work with some of the more involved combinators, though. Quote should have the following type, for instance:

quote :: (xs :-> ys) -> '[] :-> '[ xs :-> ys ]

But GHC complains again:

• Illegal polymorphic type: xs :-> ys
  GHC doesn't yet support impredicative polymorphism
• In the type signature:
    quote :: (xs :-> ys) -> '[] :-> '[xs :-> ys]

I won’t go into the detail of this particular error: if you’ve been around the block with Haskell you know that it means “wrap it in a newtype”. If we do that, though, we get yet more errors:

newtype (:~>) xs ys = Q { d :: xs :-> ys }

• Couldn't match type ‘ys ++ zs0’ with ‘ys ++ zs’
  Expected type: Stack (xs ++ zs) -> Stack (ys ++ zs)
    Actual type: Stack (xs ++ zs0) -> Stack (ys ++ zs0)
  NB: ‘++’ is a type function, and may not be injective

This injectivity error comes up often. It means that GHC needs to prove that the input to two functions is equal, but it only knows that their outputs are. This is a doubly serious problem for us, as we can’t do type family injectivity on two type variables (in current Haskell). To solve the problem, we need to rely on a weird mishmash of type families and functional dependencies:

type family (++) xs ys where
    '[] ++ ys = ys
    (x : xs) ++ ys = x : (xs ++ ys)
    
class (xs ++ ys ~ zs) => Conc xs ys zs | xs zs -> ys where
    conc :: Stack xs -> Stack ys -> Stack zs
    
instance Conc '[] ys ys where
    conc _ ys = ys
    
instance Conc xs ys zs => Conc (x : xs) ys (x : zs) where
    conc (x :- xs) ys = x :- conc xs ys

infixr 0 :->
type (:->) xs ys = forall zs yszs. Conc ys zs yszs => Stack (xs ++ zs) -> Stack yszs

And it does indeed work:

pure :: a -> '[] :-> '[a]
pure = (:-)

newtype (:~>) xs ys = Q { d :: xs :-> ys }

quote :: (xs :-> ys) -> '[] :-> '[ xs :~> ys ]
quote x = pure (Q x)

dot :: forall xs ys. ((xs :~> ys) : xs) :-> ys
dot (x :- xs) = d x xs

true :: (xs :~> ys) : (xs :~> ys) : xs :-> ys
true = swap ! drop ! dot

false :: (xs :~> ys) : (xs :~> ys) : xs :-> ys
false = drop ! dot

test :: '[] :-> '[ '[a] :~> '[a,a] ]
test = quote dup

Interestingly, these combinators represent the monadic operations on state (dot = join, pure = pure, etc.)

And can we get the nicer composition of the function from the intro? Kind of:

sumAdded = quote add ! curry ! dot ! map ! sum

Here are some references for concatenative languages: Okasaki (2002), Purdy (2012), Kerby (2007), Okasaki (2003).

Kerby, Brent. 2007. “The Theory of Concatenative Combinators.” http://tunes.org/\%7Eiepos/joy.html.

Okasaki, Chris. 2002. “Techniques for embedding postfix languages in Haskell.” In Proceedings of the ACM SIGPLAN workshop on Haskell - Haskell ’02, 105–113. Pittsburgh, Pennsylvania: ACM Press. doi:10.1145/581690.581699. http://portal.acm.org/citation.cfm?doid=581690.581699.

———. 2003. “THEORETICAL PEARLS: Flattening combinators: Surviving without parentheses.” Journal of Functional Programming 13 (4) (July): 815–822. doi:10.1017/S0956796802004483. https://www.cambridge.org/core/journals/journal-of-functional-programming/article/theoretical-pearls/3E99993FE5464986AD94D292FF5EA275.

Purdy, Jon. 2012. “The Big Mud Puddle: Why Concatenative Programming Matters.” The Big Mud Puddle. https://evincarofautumn.blogspot.com/2012/02/why-concatenative-programming-matters.html.

Some Tricks for List Manipulation

Donnacha Oisín Kidney — Wed, 08 May 2019 00:00:00 UT

Posted on May 8, 2019

Tags: Haskell

This post is a collection of some of the tricks I’ve learned for manipulating lists in Haskell. Each one starts with a puzzle: you should try the puzzle yourself before seeing the solution!

The Tortoise and the Hare

How can you split a list in half, in one pass, without taking its length?

This first one is a relatively well-known trick, but it occasionally comes in handy, so I thought I’d mention it. The naive way is as follows:

splitHalf xs = splitAt (length xs `div` 2) xs

But it’s unsatisfying: we have to traverse the list twice, and we’re taking its length (which is almost always a bad idea). Instead, we use the following function:

splitHalf :: [a] -> ([a],[a])
splitHalf xs = go xs xs
  where
    go (y:ys) (_:_:zs) = first (y:) (go ys zs)
    go ys _ = ([],ys)

The “tortoise and the hare” is the two arguments to go: it traverses the second one twice as fast, so when it hits the end, we know that the first list must be halfway done.

There and Back Again

Given two lists, xs and ys, write a function which zips xs with the reverse of ys (in one pass).

There’s a lovely paper (Danvy and Goldberg 2005) which goes though a number of tricks for how to do certain list manipulations “in reverse”. Their technique is known as “there and back again”. However, I’d like to describe a different way to get to the same technique, using folds.

Whenever I need to do some list manipulation in reverse (i.e., I need the input list to be reversed), I first see if I can rewrite the function as a fold, and then just switch out foldr for foldl.

For our puzzle here, we need to first write zip as a fold:

zip :: [a] -> [b] -> [(a,b)]
zip = foldr f b
  where
    f x k (y:ys) = (x,y) : k ys
    f x k [] = []
    b _ = []

If that looks complex, or difficult to write, don’t worry! There’s a systematic way to get to the above definition from the normal version of zip. First, let’s start with a normal zip:

zip :: [a] -> [b] -> [(a,b)]
zip [] ys = []
zip xs [] = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys

Then, we need to turn it into a case-tree, where the first branch is on the list we want to fold over. In other words, we want the function to look like this:

zip xs = case xs of
  ???

To figure out the cases, we factor out the cases in the original function. Since the second clause (zip xs [] = []) is only reachable when xs /= [], it’s effectively a case for the x:xs branch.

zip :: [a] -> [b] -> [(a,b)]
zip xs = case xs of
    [] -> \_ -> []
    x:xs -> \case
        [] -> []
        y:ys -> (x,y) : zip xs ys

Now, we rewrite the different cases to be auxiliary functions:

zip :: [a] -> [b] -> [(a,b)]
zip xs = case xs of
    [] -> b
    x:xs -> f x xs
  where
    b = \_ -> []
    f = \x xs -> \case
        [] -> []
        y:ys -> (x,y) : zip xs ys

And finally, we refactor the recursive call to the first case expression.

zip :: [a] -> [b] -> [(a,b)]
zip xs = case xs of
    [] -> b
    x:xs -> f x (zip xs)
  where
    b = \_ -> []
    f = \x xs -> \case
        [] -> []
        y:ys -> (x,y) : xs ys

Then those two auxiliary functions are what you pass to foldr!

So, to reverse it, we simply take wherever we wrote foldr f b, and replace it with foldl (flip f) b:

zipRev :: [a] -> [b] -> [(a,b)]
zipRev = foldl (flip f) b
  where
    f x k (y:ys) = (x,y) : k ys
    f x k [] = []
    b _ = []

Of course, we’re reversing the wrong list here. Fixing that is simple:

zipRev :: [a] -> [b] -> [(a,b)]
zipRev = flip (foldl (flip f) b)
  where
    f y k (x:xs) = (x,y) : k xs
    f y k [] = []
    b _ = []

Maintaining Laziness

Rewrite the above function without using continuations.

zipRev, as written above, actually uses continuation-passing style. In most languages (including standard ML, which was the one used in Danvy and Goldberg (2005)), this is pretty much equivalent to a direct-style implementation (modulo some performance weirdness). In a lazy language like Haskell, though, continuation-passing style often makes things unnecessarily strict.

Consider the church-encoded pairs:

newtype Pair a b
    = Pair
    { runPair :: forall c. (a -> b -> c) -> c
    }
    
firstC :: (a -> a') -> Pair a b -> Pair a' b
firstC f p = Pair (\k -> runPair p (k . f))

firstD :: (a -> a') -> (a, b) -> (a', b)
firstD f ~(x,y) = (f x, y)

fstD :: (a, b) -> a
fstD ~(x,y) = x

fstC :: Pair a b -> a
fstC p = runPair p const

>>> fstC (firstC (const ()) undefined)
undefined

>>> fstD (firstD (const ()) undefined)
()

So it’s sometimes worth trying to avoid continuations if there is a fast direct-style solution. (alternatively, continuations can give you extra strictness when you do want it)

First, I’m going to write a different version of zipRev, which folds on the first list, not the second.

zipRev xs ys = foldl f (\_ r -> r) xs ys []
  where
    f k x (y:ys) r = k ys ((x,y):r)

Then, we inline the definition of foldl:

zipRev xs ys = foldr f id xs (\_ r -> r) ys []
  where
    f x k c = k (\(y:ys) r -> c ys ((x,y):r))

Then, as a hint, we tuple up the two accumulating parameters:

zipRev xs ys = foldr f id xs snd (ys,[])
  where
    f x k c = k (\((y:ys),r) -> c (ys,(x,y):r))

What we can see here is that we have two continuations stacked on top of each other. When this happens, they can often “cancel out”, like so:

zipRev xs ys = snd (foldr f (ys,[]) xs)
  where
    f x (y:ys,r) = (ys,(x,y):r)

And we have our direct-style implementation!

Note 14/05/2019: the “cancel-out” explanation there is a little handwavy, as I’m sure you’ll notice. However, there are a number of excellent explanations on this stackoverflow thread which explain it much better than I ever could. Thanks to Anders Kaseorg, Will Ness, user11228628, and to Joseph Sible -Sible (2019) for asking the question.

Manual Fusion

Detect that a list is a palindrome, in one pass.

We now know a good way to split a list in two, and a good way to zip a list with its reverse. We can combine the two to get a program that checks if a list is a palindrome. Here’s a first attempt:

isPal xs = all (uncurry (==)) (uncurry zipRev (splitHalf xs))

But this is doing three passes!

To get around it, we can manually do some fusion. Fusion is a technique where we can spot scenarios like the following:

foldr f b (x : y : [])

And translate them into a version without a list:

x `f` (y `f` b)

The trick is making sure that the consumer is written as a fold, and then we just put its f and b in place of the : and [] in the producer.

So, when we inline the definition of splitHalf into zipRev, we get the following:

zipRevHalf :: [a] -> [(a,a)]
zipRevHalf xs = snd (go xs xs)
  where
    go (y:ys) (_:_:zs) = f y (go ys zs)
    go (_:ys) [_]      = (ys,[])
    go ys []           = (ys,[])

    f x (y:ys,r) = (ys,(x,y):r)

isPal xs = all (uncurry (==)) (zipRevHalf xs)

(adding a special case for odd-length lists)

Finally, the all (uncurry (==)) is implemented as a fold also. So we can fuse it with the rest of the definitions:

isPal :: Eq a => [a] -> Bool
isPal xs = snd (go xs xs)
  where
    go (y:ys) (_:_:zs) = f y (go ys zs)
    go (_:ys) [_]      = (ys,True)
    go ys     []       = (ys,True)
    
    f x (y:ys,r) = (ys,(x == y) && r)

You may have spotted the writer monad over All there. Indeed, we can rewrite it to use the monadic bind:

isPal :: Eq a => [a] -> Bool
isPal xs = getAll (fst (go xs xs)) where
  go (y:ys) (_:_:zs) = f y =<< go ys zs
  go (_:ys) [_]      = pure ys
  go ys     []       = pure ys
  
  f y (z:zs) = (All (y == z), zs)

Eliminating Multiple Passes with Laziness

Construct a Braun tree from a list in linear time.

This is also a very well-known trick (Bird 1984), but today I’m going to use it to write a function for constructing Braun trees.

A Braun tree is a peculiar structure. It’s a binary tree, where adjacent branches can differ in size by only 1. When used as an array, it has $\mathcal{O}(\log n)$ lookup times. It’s enumerated like so:

     ┌─7
   ┌3┤
   │ └11
 ┌1┤
 │ │ ┌─9
 │ └5┤
 │   └13
0┤
 │   ┌─8
 │ ┌4┤
 │ │ └12
 └2┤
   │ ┌10
   └6┤
     └14

The objective is to construct a tree from a list in linear time, in the order defined above. Okasaki (1997) observed that, from the list:

[0..14]

Each level in the tree is constructed from chucks of powers of two. In other words:

[[0],[1,2],[3,4,5,6],[7,8,9,10,11,12,13,14]]

From this, we can write the following function:

rows k [] = []
rows k xs = (k , take k xs) : rows (2*k) (drop k xs)

build (k,xs) ts = zipWith3 Node xs ts1 ts2
  where
    (ts1,ts2) = splitAt k (ts ++ repeat Leaf)
    
fromList = head . foldr build [Leaf] . rows 1

The first place we’ll look to eliminate a pass is the build function. It combines two rows by splitting the second in half, and zipping it with the first.

>>> build (3, [x1,x2,x3]) [y1,y2,y3,y4,y5,y6]
[(x1,y1,y4),(x2,y2,y5),(x3,y3,y6)]

We don’t need to store the length of the first list, though, as we are only using it to split the second, and we can do that at the same time as the zipping.

zipUntil :: (a -> b -> c) -> [a] -> [b] -> ([c],[b])
zipUntil _ [] ys = ([],ys)
zipUntil f (x:xs) (y:ys) = first (f x y:) (zipUntil f xs ys)

>>> zipUntil (,) [1,2] "abc"
([(1,'a'),(2,'b')],"c")

Using this function in build looks like the following:

build (k,xs) ts = zipWith ($) ys ts2
  where
    (ys,ts2) = zipUntil Node xs (ts ++ repeat Leaf)

That top-level zipWith is also unnecessary, though. If we make the program circular, we can produce ts2 as we consume it, making the whole thing single-pass.

build xs ts = ys
  where
    (ys,ts2) = zip3Node xs (ts ++ repeat Leaf) ts2
    zip3Node (x:xs) (y:ys) ~(z:zs) = first (Node x y z:) (zip3Node xs ys zs) 
    zip3Node [] ys _ = ([], ys)

That zip3Node is a good candidate for rewriting as a fold, also, making the whole thing look like this:

rows k [] = []
rows k xs = take k xs : rows (2*k) (drop k xs)

build xs ts = ys
  where
    (ys,zs) = foldr f b xs ts zs
    f x xs (y:ys) ~(z:zs) = first (Node x y z:) (xs ys zs) 
    b ys _ = ([],ys)
    
fromList = head . foldr build (repeat Leaf) . rows 1

To fuse all of those definitions, we first will need to rewrite rows as a fold:

rows xs = uncurry (:) (foldr f b xs 1 2)
  where
    b _ _ = ([],[])
    f x k 0 j = ([], uncurry (:) (f x k j (j*2)))
    f x k i j = first (x:) (k (i-1) j)

Once we have everything as a fold, the rest of the transformation is pretty mechanical. At the end of it all, we get the following linear-time function for constructing a Braun tree from a list:

fromList :: [a] -> Tree a
fromList xs = head (l (foldr f b xs 1 2))
  where
    b _ _ ys zs = (repeat Leaf, (repeat Leaf, ys))
    
    l k = let (xs, ys) = uncurry k ys in xs
    
    f x k 0 j ys zs           = ([], (l (f x k j (j*2)), ys))
    f x k i j ~(y:ys) ~(z:zs) = first (Node x y z:) (k (i-1) j ys zs)

References

Bird, R. S. 1984. “Using Circular Programs to Eliminate Multiple Traversals of Data.” Acta Inf. 21 (3) (October): 239–250. doi:10.1007/BF00264249.

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” Fundamenta Informaticae 66 (4) (December): 397–413. https://cs.au.dk/~danvy/DSc/08_danvy-goldberg_fi-2005.pdf.

Okasaki, Chris. 1997. “Three Algorithms on Braun Trees.” Journal of Functional Programming 7 (6) (November): 661–666. doi:10.1017/S0956796897002876.

Sible, Joseph. 2019. “How can two continuations cancel each other out?” Stack Overflow. https://stackoverflow.com/questions/56122022/how-can-two-continuations-cancel-each-other-out.

List Syntax in Agda

Donnacha Oisín Kidney — Sat, 20 Apr 2019 00:00:00 UT

Posted on April 20, 2019

Tags: Agda

Just some silly examples of how to get a nice list syntax with mixfix operators in Agda.

Intro and Imports

Probability Monads in Cubical Agda

Donnacha Oisín Kidney — Wed, 17 Apr 2019 00:00:00 UT

Posted on April 17, 2019

Tags: Agda, Probability

Cubical Agda has just come out, and I’ve been playing around with it for a bit. There’s a bunch of info out there on the theory of cubical types, and Homotopy Type Theory more generally (cubical type theory is kind of like an “implementation” of Homotopy type theory), but I wanted to make a post demonstrating cubical Agda in practice, and one of its cool uses from a programming perspective.

So What is Cubical Agda?

I don’t really know! Cubical type theory is quite complex (even for a type theory), and I’m not nearly qualified to properly explain it. In lieu of a proper first-principles explanation, then, I’ll try and give a few examples of how it differs from normal Agda, before moving on to the main example of this post.

Imports

Permutations By Sorting

Donnacha Oisín Kidney — Sun, 24 Mar 2019 00:00:00 UT

Posted on March 24, 2019

Tags: Haskell, Agda

A naive—and wrong—way to shuffle a list is to assign each element in the list a random number, and then sort it. It might not be immediately obvious why: Kiselyov (2002) has a good explanation as to the problem. One way to think about it is like this: choosing $n$ random numbers each in the range $[0,n)$ has $n^n$ possible outcomes, whereas there are $n!$ permutations. Since these don’t necessarily divide evenly into each other, you’re going to have some bias.

Factorial Numbers

The first part of the fix is to figure out a way to get some random data that has only $n!$ possible values. The trick here will be to mimic the structure of a factorial itself: taking $n = 5$ , the previous technique would have yielded:

$5 \times 5 \times 5 \times 5 \times 5 = 5^5$

possible values. But we want:

$5 \times 4 \times 3 \times 2 \times 1 = 5!$

The solution is simple, then! Simply decrement the range by one for each position in the output list. In Haskell:

nums :: Int -> IO [Int]
nums 0 = pure []
nums n = (:) <$> randomR (0,n) <*> nums (n-1)

As an aside, what we’ve done here is constructed a list of digits in the factorial number system.

Sorts

Unfortunately, while we’ve figured out a way to get properly distributed random data, we can’t yet sort it to shuffle our list. If we look at the 6 factorial numbers generated for $n = 5$ , we can see the problem:

Different values in the list will produce the same sort: 100 and 200, for instance.

Lehmer Codes

We need a way to map the numbers above to a particular permutations: that’s precisely the problem solved by Lehmer codes. For the numbers 110, we can think of each digit as the relative position to put that item from the string into. Some Haskell code might make it clear:

insert :: Int -> a -> [a] -> [a]
insert 0 x xs = x : xs
insert i x (y:ys) = y : insert (i-1) x ys

shuffle :: [a] -> [Int] -> [a]
shuffle xs ys = foldr (uncurry insert) [] (zip ys xs)

And we can step through its execution:

shuffle "abc" [1,1,0]
foldr (uncurry insert) [] [(1,'a'),(1,'b'),(0,'c')]
insert 1 'a' (insert 1 'b' (insert 0 'c' []))
insert 1 'a' (insert 1 'b' "c")
insert 1 'a' "cb"
'c' : insert 0 'a' "b"
"cab"

Dualities of Sorts

Notice the similarity of the function above to a standard insertion sort:

insert :: Ord a => a -> [a] -> [a]
insert x [] = x : []
insert x (y:ys)
 | x <= y = x : y : ys
 | otherwise = y : insert x ys

insertSort :: Ord a => [a] -> [a]
insertSort = foldr insert []

The “comparison” is a little strange—we have to take into account relative position—but the shape is almost identical. Once I spot something like that, my first thought is to see if the relationship extends to a better $\mathcal{O}(n \log n)$ sort, but there’s something else I’d like to look at first.

“A Duality of Sorts” (Hinze, Magalhães, and Wu 2013) is a paper based on the interesting symmetry between insertion sort and selection sort [There’s also a video of Graham Hutton explaining the idea; Haran (2016)].

With that paper in mind, can we rewrite shuffle as a selection-based algorithm? We can indeed!

pop :: [(Int,a)] -> Maybe (a, [(Int,a)])
pop [] = Nothing
pop ((0,x):xs) = Just (x, xs)
pop ((i,x):xs) = (fmap.fmap) ((i-1,x):) (pop xs)

shuffle :: [a] -> [Int] -> [a]
shuffle xs ys = unfoldr pop (zip ys xs)

While the symmetry is pleasing, the paper details how to make the relationship explicit, using the same function for both selection and insertion sort:

swop Nil = Nil
swop (Cons a (x , Nil)) = Cons a (Left x)
swop (Cons a (x , Cons b x'))
  | fst a == 0 = Cons a (Left x)
  | otherwise  = Cons b (Right (Cons (first pred a) x'))
  
ishuffle :: [(Int,a)] -> [(Int,a)]
ishuffle = cata (apo (swop . fmap (id &&& project)))

sshuffle :: [(Int,a)] -> [(Int,a)]
sshuffle = ana (para (fmap (id ||| embed) . swop))

Improved Efficiency

So now we have to upgrade our sorts: in the paper, merge sort is the more efficient sort chosen, similarly to what I chose previously.

merge [] ys = ys
merge xs [] = xs
merge ((x,i):xs) ((y,j):ys)
  | i <= j    = (x,i) : merge xs ((y,j-i):ys)
  | otherwise = (y,j) : merge ((x,i-j-1):xs) ys
  
treeFold :: (a -> a -> a) -> a -> [a] -> a
treeFold f = go
  where
    go x [] = x
    go a (b:l) = go (f a b) (pairMap l)
    pairMap (x:y:rest) = f x y : pairMap rest
    pairMap xs = xs
    
shuffle xs inds = map fst $ treeFold merge [] $ map pure $ zip xs inds

However, I feel like merge sort is an upgrade of insertion sort, not selection sort. Indeed, if you do the “split” step of merge sort badly, i.e. by splitting very unevenly, merge sort in fact becomes insertion sort!

So there’s a missing bit of this table:

	Insertion	Selection
$\mathcal{O}(n^2)$	Insertion sort	Selection sort
$\mathcal{O}(n \log n)$	Merge sort	???

I think it’s clear that quicksort is the algorithm that fits in there: again, done badly it degrades to selection sort (if you intentionally pick the pivot to be the worst element possible, i.e. the smallest element).

There are more symmetries: merge sort splits the lists using their structure, and merges them using the ordering of the elements. Quicksort is the opposite, merging by concatenation, but splitting using order. Finally, in merge sort adjacent elements are in the correct order after the recursive call, but the two sides of the split are not. Again, quicksort is precisely the opposite: adjacent elements have not been compared (before the recursive call), but the two sides of the split are correctly ordered.

Anyway, I haven’t yet formalised this duality (and I don’t know if I can), but we can use it to produce a quicksort-based shuffle algorithm:

partition = foldr f (const ([],[]))
  where
    f (y,j) ys i
      | i <= j    = fmap  ((y,j-i):) (ys i)
      | otherwise = first ((y,j):) (ys (i-1))
      
shuffle :: [a] -> [Int] -> [a]
shuffle xs ys = go (zip xs ys)
  where
    go [] = []
    go ((x,i):xs) = case partition xs i of
        (ls,rs) -> go ls ++ [x] ++ go rs

That’s all for this post! The algorithms can all be translated into Agda or Idris: I’m currently working on a way to represent permutations that isn’t $\mathcal{O}(n^2)$ using them. If I figure out a way to properly dualise quicksort and merge sort I’ll do a small write up as well (I’m currently working my way through Hinze et al. 2012 for ideas). Finally, I’d like to explore some other sorting algorithms as permutation algorithms: sorting networks seem especially related to “permutations by swapping”.

References

Haran, Brady. 2016. “Sorting Secret.” https://www.youtube.com/watch?v=pcJHkWwjNl4.

Hinze, Ralf, Daniel W. H. James, Thomas Harper, Nicolas Wu, and José Pedro Magalhães. 2012. “Sorting with bialgebras and distributive laws.” In Proceedings of the 8th ACM SIGPLAN workshop on Generic programming - WGP ’12, 69. Copenhagen, Denmark: ACM Press. doi:10.1145/2364394.2364405.

Hinze, Ralf, José Pedro Magalhães, and Nicolas Wu. 2013. “A Duality of Sorts.” In The Beauty of Functional Code: Essays Dedicated to Rinus Plasmeijer on the Occasion of His 61st Birthday, ed by. Peter Achten and Pieter Koopman, 151–167. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/978-3-642-40355-2_11.

Kiselyov, Oleg. 2002. “Provably perfect random shuffling and its pure functional implementations.” http://okmij.org. http://okmij.org/ftp/Haskell/AlgorithmsH.html#perfect-shuffle.

Lazy Binary Numbers

Donnacha Oisín Kidney — Thu, 21 Mar 2019 00:00:00 UT

Posted on March 21, 2019

Part 1 of a 1-part series on Binary Numbers

Tags: Agda, Haskell

Number Representations

When working with numbers in Agda, we usually use the following definition:

data N = Z | S N deriving (Eq, Ord)

instance Num N where
    Z + n = n
    S n + m = S (n + m)

    Z * m = Z
    S n * m = m + n * m

data ℕ : Set where
  zero : ℕ
  suc : ℕ → ℕ

_+_ : ℕ → ℕ → ℕ
zero  + y = y
suc x + y = suc (x + y)

_*_ : ℕ → ℕ → ℕ
zero  * y = zero
suc x * y = y + (x * y)

Haskell

Agda

In Haskell it’s less common, for obvious reasons:

Operation	Complexity
$n + m$	$\mathcal{O}(n)$
$n \times m$	$\mathcal{O}(nm)$

Why use them at all, then? Well, in Agda, we need them so we can prove things about the natural numbers. Machine-level integers are fast, but they’re opaque: their implementation isn’t written in Agda, and therefore it’s not available for the compiler to reason about.

In Haskell, they occasionally find uses due to their laziness. This can help in Agda as well. By lazy here I mean that operations on them don’t have to inspect the full structure before giving some output.

>>> Z < S undefined
True

*-zeroˡ : ∀ x → zero * x ≡ zero
*-zeroˡ x = refl

In Haskell, as we can see, this lets us run computations without scrutinising some arguments. Agda benefits similarly: here it lets the compiler see more “obvious” facts that it may have missed otherwise.

It’s not completely lazy, though. In particular, it tends to be left-biased:

>>> undefined * Z == Z
** Exception: Prelude.undefined

*-zeroʳ : ∀ x → x * zero ≡ zero
*-zeroʳ x = refl
-- x * zero != zero of type ℕ

Like Boolean short-circuiting operators, operations on Peano numbers will usually have to scrutinise the left-hand-side argument quite a bit before giving an output.

So, Peano numbers are good because:

We can prove things about them.
They’re lazy.

In this post, I’m going to look at some other number representations that maintain these two desirable properties, while improving on the efficiency somewhat.

List-of-Bits-Binary

The first option for an improved representation is binary numbers. We can represent binary numbers as a list of bits:

data Bit = O | I deriving (Eq, Show, Ord)

type B = [Bit]

data Bit : Set where O I : Bit

𝔹 : Set
𝔹 = List Bit

As we’re using these to represent natural numbers, we’ll need to define a way to convert between them:

eval :: B -> N
eval = foldr f Z
  where
    f O xs = xs + xs
    f I xs = S (xs + xs)

inc :: B -> B
inc [] = [I]
inc (O:xs) = I : xs
inc (I:xs) = O : inc xs

fromN :: N -> B
fromN Z = []
fromN (S n) = inc (fromN n)

⟦_⇓⟧ : 𝔹 → ℕ
⟦_⇓⟧ = foldr (λ { O xs → xs + xs
                ; I xs → suc (xs + xs) })
             zero

inc : 𝔹 → 𝔹
inc [] = I ∷ []
inc (O ∷ xs) = I ∷ xs
inc (I ∷ xs) = O ∷ inc xs

⟦_⇑⟧ : ℕ → 𝔹
⟦ zero  ⇑⟧ = []
⟦ suc n ⇑⟧ = inc ⟦ n ⇑⟧

And here we run into our first problem: redundancy. There are multiple ways to represent the same number according to the semantics defined above. We can actually prove this in Agda:

redundant : ∃₂ λ x y → x ≢ y × ⟦ x ⇓⟧ ≡ ⟦ y ⇓⟧
redundant = [] , O ∷ [] , (λ ()) , refl

In English: “There are two binary numbers which are not the same, but which do evaluate to the same natural number”. (This proof was actually automatically filled in for me after writing the signature)

This represents a huge problem for proofs. It means that even simple things like $x \times 0 = 0$ aren’t true, depending on how multiplication is implemented. On to our next option:

List-of-Gaps-Binary

Instead of looking at the bits directly, let’s think about a binary number as a list of chunks of 0s, each followed by a 1. In this way, we simply can’t have trailing zeroes, because the definition implies that every number other than 0 ends in 1.

data Gap = Z | S Gap
type B = [Gap]

𝔹 : Set
𝔹 = List ℕ

This guarantees a unique representation. As in the representation above, it has much improved time complexities for the familiar operations:

Operation	Complexity
$n + m$	$\mathcal{O}(\log_2 n)$
$n \times m$	$\mathcal{O}(\log_2 (n + m))$

Encoding the zeroes as gaps also makes multiplication much faster in certain cases: multiplying by a high power of 2 is a constant-time operation, for instance.

It does have one disadvantage, and it’s to do with the increment function:

inc :: B -> B
inc = uncurry (flip (:)) . inc'
  where
    inc' [] = ([], Z)
    inc' (x:xs) = inc'' x xs
    
    inc'' Z ns = fmap S (inc' ns)
    inc'' (S n) ns = (n:ns,Z)

𝔹⁺ : Set
𝔹⁺ = ℕ × 𝔹

inc : 𝔹 → 𝔹
inc = uncurry _∷_ ∘ inc′
  module Inc where
  mutual
    inc′ : 𝔹 → 𝔹⁺
    inc′ [] = 0 , []
    inc′ (x ∷ xs) = inc″ x xs

    inc″ : ℕ → 𝔹 → 𝔹⁺
    inc″ zero ns = map₁ suc (inc′ ns)
    inc″ (suc n) ns = 0 , n ∷ ns

With all of their problems, Peano numbers performed this operation in constant time. The above implementation is only amortised constant-time, though, with a worst case of $\mathcal{O}(\log_2 n)$ (same as the list-of-bits version). There are a number of ways to remedy this, the most famous being:

Skew Binary

This encoding has three digits: 0, 1, and 2. To guarantee a unique representation, we add the condition that there can be at most one 2 in the number, which must be the first non-zero digit if it’s present.

To represent this we’ll encode “gaps”, as before, with the condition that if the second gap is 0 it actually represents a 2 digit in the preceding position. That weirdness out of the way, we are rewarded with an inc implementation which is clearly $\mathcal{O}(1)$ .

inc :: B -> B
inc [] = Z : []
inc (x:[]) = Z : x : []
inc (x  : Z : xs) = S x : xs
inc (x1 : S x2 : xs) = Z : x1 : x2 : xs

inc : 𝔹 → 𝔹
inc [] = 0 ∷ []
inc (x ∷ []) = 0 ∷ x ∷ []
inc (x₁ ∷ zero ∷ xs) = suc x₁ ∷ xs
inc (x₁ ∷ suc x₂ ∷ xs) = 0 ∷ x₁ ∷ x₂ ∷ xs

Unfortunately, though, we’ve lost the other efficiencies! Addition and multiplication have no easy or direct encoding in this system, so we have to convert back and forth between this and regular binary to perform them.

List-of-Segments-Binary

The key problem with incrementing in the normal binary system is that it can cascade: when we hit a long string of 1s, all the 1s become 0 followed by a single 1. We can turn this problem to our advantage if we use a representation which encodes both 1s and 0s as strings of gaps. We’ll have to use a couple more tricks to ensure a unique representation, but all in all this is what we have (switching to just Agda now):

data 0≤_ (A : Set) : Set where
  0₂  : 0≤ A
  0<_ : A → 0≤ A

mutual
  record 𝔹₀ : Set where
    constructor _0&_
    inductive
    field
      H₀ : ℕ
      T₀ : 𝔹₁

  record 𝔹₁ : Set where
    constructor _1&_
    inductive
    field
      H₁ : ℕ
      T₁ : 0≤  𝔹₀
open 𝔹₀ public
open 𝔹₁ public

data 𝔹⁺ : Set where
  B₀_ : 𝔹₀ → 𝔹⁺
  B₁_ : 𝔹₁ → 𝔹⁺

𝔹 : Set
𝔹 = 0≤ 𝔹⁺

inc⁺ : 𝔹 → 𝔹⁺
inc⁺ 0₂                               =      B₁ 0     1& 0₂
inc⁺ (0< B₀ zero  0& y 1& xs        ) =      B₁ suc y 1& xs
inc⁺ (0< B₀ suc x 0& y 1& xs        ) =      B₁ 0     1& 0< x 0& y 1& xs
inc⁺ (0< B₁ x 1& 0₂                 ) = B₀ x 0& 0     1& 0₂
inc⁺ (0< B₁ x 1& 0< zero  0& z 1& xs) = B₀ x 0& suc z 1& xs
inc⁺ (0< B₁ x 1& 0< suc y 0& z 1& xs) = B₀ x 0& 0     1& 0< y 0& z 1& xs

inc : 𝔹 → 𝔹
inc x = 0< inc⁺ x

Perfect! Increments are obviously $\mathcal{O}(1)$ , and we’ve guaranteed a unique representation.

I’ve been working on this type for a couple of days, and you can see my code here. So far, I’ve done the following:

Defined inc, addition, and multiplication

These were a little tricky to get right (addition is particularly hairy), but they’re all there, and maximally lazy.

Proved Homomorphism

For each one of the functions, you want them to correspond precisely to the equivalent functions on Peano numbers. Proving that fact amounts to filling in definitions for the following:

inc-homo : ∀ x → ⟦ inc x ⇓⟧ ≡ suc ⟦ x ⇓⟧
+-homo : ∀ x y → ⟦ x + y ⇓⟧ ≡ ⟦ x ⇓⟧ + ⟦ y ⇓⟧
*-homo : ∀ x y → ⟦ x * y ⇓⟧ ≡ ⟦ x ⇓⟧ * ⟦ y ⇓⟧

Proved Bijection

As we went to so much trouble, it’s important to prove that these numbers form a one-to-one correspondence with the Peano numbers. As well as that, once done, we can use it to prove facts about the homomorphic functions above, by reusing any proofs about the same functions on Peano numbers. For instance, here is a proof of commutativity of addition:

+-comm : ∀ x y → x + y ≡ y + x
+-comm x y = injective (+-homo x y ⟨ trans ⟩
                        ℕ.+-comm ⟦ x ⇓⟧ ⟦ y ⇓⟧ ⟨ trans ⟩
                        sym (+-homo y x))

Applications

So now that we have our nice number representation, what can we do with it? One use is as a general-purpose number type in Agda: it represents a good balance between speed and “proofiness”, and Coq uses a similar type in its standard library.

There are other, more unusual uses of such a type, though.

Data Structures

It’s a well-known technique to build a data structure out of some number representation (Hinze 1998): in fact, all of the representations above are explored in Okasaki (1999, chap. 9.2).

Logic Programming

Logic programming languages like Prolog let us write programs in a backwards kind of way. We say what the output looks like, and the unifier will figure out the set of inputs that generates it.

In Haskell, we have a very rough approximation of a similar system: the list monad.

pyth :: [(Int,Int,Int)]
pyth = do
  x <- [1..10]
  y <- [1..10]
  z <- [1..10]
  guard (x*x + y*y == z*z)
  return (x,y,z)

There are tons of inefficiencies in the above code: for us, though, we can look at one: the number representation. In the equation:

$x^2 + y^2 = z^2$

If we know that $x$ and $y$ are both odd, then $z$ must be even. If the calculation of the equation is expensive, this is precisely the kind of shortcut we’d want to take advantage of. Luckily, our binary numbers do just that: it is enough to scrutinise just the first bits of $x$ and $y$ in order to determine the first bit of the output.

After seeing that example, you may be thinking that lazy evaluation is a perfect fit for logic programming. You’re not alone! Curry (Hanus (ed.) 2016) is a lazy, functional logic programming language, with a similar syntax to Haskell. It also uses lazy binary numbers to optimise testing.

Lazy Predicates

In order for queries to be performed efficiently on binary numbers, we will also need a way to describe lazy predicates on them. A lot of these predicates are more easily expressible on the list-of-bits representation above, so we’ll be working with that representation for this bit. Not to worry, though: we can convert from the segmented representation into the list-of-bits, and we can prove that the conversion is injective:

toBits-injective : ∀ xs ys → toBits xs ≡ toBits ys → xs ≡ ys

Here’s the curious problem: since our binary numbers are expressed least-significant-bit-first, we have to go to the end before knowing which is bigger. Luckily, we can use one of my favourite Haskell tricks, involving the ordering monoid:

data Ordering : Set where
  lt eq gt : Ordering

_∙_ : Ordering → Ordering → Ordering
lt ∙ y = lt
eq ∙ y = y
gt ∙ y = gt

cmpBit : Bit → Bit → Ordering
cmpBit O O = eq
cmpBit O I = lt
cmpBit I O = gt
cmpBit I I = eq

compare : Bits → Bits → Ordering
compare [] [] = eq
compare [] (_ ∷ _) = lt
compare (_ ∷ _) [] = gt
compare (x ∷ xs) (y ∷ ys) = compare xs ys ∙ cmpBit x y

Thanks to laziness, this function first compares the length of the lists, and then does a lexicographical comparison in reverse only if the lengths are the same. This is exactly what we want for our numbers.

Future Posts

That’s all I have for now, but I’m interested to formalise the laziness of these numbers in Agda. Usually that’s done with coinduction: I would also like to see the relationship with exact real arithmetic.

I wonder if it can be combined with O’Connor (2016) to get some efficient proof search algorithms, or with Escardo (2014) to get more efficient exhaustive search.

References

Escardo, Martin. 2014. “Seemingly impossible constructive proofs | Mathematics and Computation.” Mathematics and Computation. http://math.andrej.com/2014/05/08/seemingly-impossible-proofs/.

Hanus (ed.), M. 2016. Curry: An Integrated Functional Logic Language (Vers. 0.9.0). Available at http://www.curry-language.org. https://www-ps.informatik.uni-kiel.de/currywiki/.

Hinze, Ralf. 1998. Numerical Representations as Higher-Order Nested Datatypes. Institut für Informatik III, Universität Bonn. http://www.cs.ox.ac.uk/ralf.hinze/publications/\#R5.

O’Connor, Liam. 2016. “Applications of Applicative Proof Search.” In Proceedings of the 1st International Workshop on Type-Driven Development, 43–55. TyDe 2016. New York, NY, USA: ACM. doi:10.1145/2976022.2976030. http://doi.acm.org/10.1145/2976022.2976030.

Okasaki, Chris. 1999. Purely Functional Data Structures. Cambridge University Press.

More Agda Tips

Donnacha Oisín Kidney — Thu, 14 Mar 2019 00:00:00 UT

Posted on March 14, 2019

Part 2 of a 2-part series on Agda Tips

Tags: Agda

Literate Agda

For including Agda code in LaTeX files, Agda’s built-in literate programming support is a great tool. It typesets code well, and ensures that it typechecks which can help avoid typos.

Embedding Agda Code in LaTeX

I write the LaTeX document in one file, and the Agda code in another .lagda file. Using the catchfilebetweentags LaTeX package, I can then embed snippets of the Agda code into the LaTeX document. For instance, in a file named Lists.lagda I can have the following:

%<*head-type>
\begin{code}
head : List A → Maybe A
\end{code}
%

\begin{code}
head [] = nothing
head (x ∷ xs) = just x
\end{code}

Then, after compiling the Agda file with agda --latex --output-dir=. Lists.lagda, I can embed the snippet head : List A → Maybe A into the TeX file like so:

\ExecuteMetaData[Lists.tex]{head-type}

Dealing with Unicode

Most Agda source code will be Unicode-heavy, which doesn’t work well in LaTeX. There are a few different ways to deal with this: you could use XeTeX, which handles Unicode better, for instance. I found it easier to use the ucs package, and write a declaration for each Unicode character as I came across it. For the ∷ character above, for instance, you can write:

\usepackage{ucs}
\DeclareUnicodeCharacter{8759}{\ensuremath{\squaredots}}

Live Reloading

For plain LaTeX code, I use Spacemacs and Skim to get live reloading. When I save the LaTeX source code, the Skim window refreshes and jumps to the point my editing cursor is at. I use elisp code from this blog post.

For Agda code, live reloading gets a little trickier. If I edit an Agda source file, the LaTeX won’t automatically recompile it. However, based on this stack exchange answer, you can put the following .latexmkrc file in the same directory as your .lagda files and your .tex file:

add_cus_dep('lagda','tex',0,'lagda2tex');

sub lagda2tex {
    my $base = shift @_;
    return system('agda', '--latex', '--latex-dir=.', "$base.lagda");
}

This will recompile the literate Agda files whenever they’re changed. Unfortunately, it doesn’t automate it the first time you do it: it needs to see the .tex files to see the dependency. You can fix this yourself, by running agda --latex --output-dir=. when you add a new .lagda file (just once, after that the automation will take over), or you can use a script like the following:

#!/bin/bash
find . -type f -name '*.lagda' | while read -r code ; do
    dir=$(dirname "$code")
    file=$(basename "$code" .lagda).tex
    if [ ! -e "$dir/$file" ]
    then
        agda --latex --latex-dir=. "$code"
    fi
done

This will compile any .lagda file it finds that doesn’t have a corresponding .tex file (so it won’t slow things down). Then call that script on the first line of your .latexmkrc, like so:

system("bash ./init-missing-lagda.sh");
add_cus_dep('lagda','tex',0,'lagda2tex');

sub lagda2tex {
    my $base = shift @_;
    return system('agda', '--latex', '--latex-dir=.', "$base.lagda");
}

Flags for Debugging

There are a number of undocumented flags you can pass to Agda which are absolutely invaluable when it comes to debugging. One of them can tell you more about termination checking, another reports on type checking (tc), another for profiling (profile), and so on. Set the verbosity level (agda -v 100) to get more or less info.

Type Checking Order

Agda does type checking from left to right. This isn’t always desired: as an example, if we want to annotate a value with its type, we can use the following function:

the : ∀ {a} (A : Set a) → A → A
the _ x = x

example : _
example = the ℕ 3

Coming from Haskell, though, this is the wrong way around. We usually prefer to write something like 3 :: Int. We can’t write that as a simple function in Agda, though, so we instead use a syntax declaration:

syntax the ty x = x ∷ ty

example : _
example = 3 ∷ ℕ

Changing the order of type checking can also speed up typechecking in some cases. There’s more information about syntax declarations in Agda’s documentation.

Finger Trees in Agda

Donnacha Oisín Kidney — Mon, 25 Feb 2019 00:00:00 UT

Posted on February 25, 2019

Tags: Agda

This Post is Available With Clickable Code Here

This whole post is written with clickable identifiers and ascii art at the above link. I also provide the normal version below in case there are any problems rendering.

As I have talked about previously, a large class of divide-and conquer algorithms rely on “good” partitioning for the divide step. If you then want to make the algorithms incremental, you keep all of those partitions (with their summaries) in some “good” arrangement (Mu, Chiang, and Lyu 2016). Several common data structures are designed around this principle: binomial heaps, for instance, store partitions of size $2^n$ . Different ways of storing partitions favours different use cases: switch from a binomial heap to a skew binomial, for instance, and you get constant-time cons.

The standout data structure in this area is Hinze and Paterson’s finger tree (Hinze and Paterson 2006). It caches summaries in a pretty amazing way, allowing for (amortised) $\mathcal{O}(1)$ cons and snoc and $\mathcal{O}(\log n)$ split and append. These features allow it to be used for a huge variety of things: Data.Sequence uses it as a random-access sequence, but it can also work as a priority queue, a search tree, a priority search tree (Hinze 2001), an interval tree, an order statistic tree…

All of these applications solely rely on an underlying monoid. As a result, I thought it would be a great data structure to implement in Agda, so that you’d get all of the other data structures with minimal effort [similar thinking motivated a Coq implementation; Sozeau (2007)].

Scope of the Verification

There would be no real point to implementing a finger tree in Agda if we didn’t also prove some things about it. The scope of the proofs I’ve done so far are intrinsic proofs of the summaries in the tree. In other words, the type of cons is as follows:

cons : ∀ x {xs} → Tree xs → Tree (μ x ∙ xs)

This is enough to prove things about the derived data structures (like the correctness of sorting if it’s used as a priority queue), but it’s worth pointing out what I haven’t proved (yet):

Invariants on the structure (“safe” and “unsafe” digits and so on).
The time complexity or performance of any operations.

To be honest, I’m not even sure that my current implementation is correct in these regards! I’ll probably have a go at proving them in the future (possibly using Danielsson 2008).

Monoids and Proofs

The bad news is that finger trees are a relatively complex data structure, and we’re going to need a lot of proofs to write a verified version. The good news is that monoids (in contrast to rings) are extremely easy to prove automatically. In this project, I used reflection to do so, but I think it should be possible to do with instance resolution also.

Measures

First things first, we need a way to talk about the summaries of elements we’re interested in. This is captured by the following record type:

record σ {a} (Σ : Set a) : Set (a ⊔ r) where
  field
    μ : Σ → 𝓡
    
open σ ⦃ ... ⦄

𝓡 is the type of the summaries, and μ means “summarise”. The silly symbols are used for brevity: we’re going to be using this thing everywhere, so it’s important to keep it short. Here’s an example instance for lists:

instance
  σ-List : ∀ {a} {Σ : Set a} → ⦃ _ : σ Σ ⦄ → σ (List Σ)
  μ ⦃ σ-List ⦄ = List.foldr (_∙_ ∘ μ) ε

Working With Setoids

As I mentioned, the tree is going to be verified intrinsically. In other word its type will look something like this:

Tree : 𝓡 → Set

But before running off to define that the obvious way, I should mention that I made the annoying decision to use a setoid (rather than propositional equality) based monoid. This means that we don’t get substitution, making the obvious definition untenable.

I figured out a solution to the problem, but I’m not sure if I’m happy with it. That’s actually the main motivation for writing this post: I’m curious if other people have better techniques for this kind of thing.

To clarify: “this kind of thing” is writing intrinsic (correct-by-construction) proofs when a setoid is involved. Intrinsic proofs usually lend themselves to elegance: to prove that map preserves a vector’s length, for instance, basically requires no proof at all:

map : ∀ {a b n} {A : Set a} {B : Set b}
    → (A → B)
    → Vec A n
    → Vec B n
map f [] = []
map f (x ∷ xs) = f x ∷ map f xs

But that’s because pattern matching works well with propositional equality: in the first clause, n is set to 0 automatically. If we were working with setoid equality, we’d instead maybe get a proof that n ≈ 0, and we’d have to figure a way to work that into the types.

Fibres

The first part of the solution is to define a wrapper type which stores information about the size of the thing it contains:

record μ⟨_⟩≈_ {a} (Σ : Set a) ⦃ _ : σ Σ ⦄ (𝓂 : 𝓡) : Set (a ⊔ r ⊔ m) where
  constructor _⇑[_]
  field
    𝓢 : Σ
    𝒻 : μ 𝓢 ≈ 𝓂

Technically speaking, I think this is known as a “fibre”. μ⟨ Σ ⟩≈ 𝓂 means “There exists a Σ such that μ Σ ≈ 𝓂”. Next, we’ll need some combinators to work with:

infixl 2 _≈[_]
_≈[_] : ∀ {a} {Σ : Set a} ⦃ _ : σ Σ ⦄ {x : 𝓡} → μ⟨ Σ ⟩≈ x → ∀ {y} → x ≈ y → μ⟨ Σ ⟩≈ y
𝓢 (xs ≈[ y≈z ]) = 𝓢 xs
𝒻 (xs ≈[ y≈z ]) = trans (𝒻 xs) y≈z

This makes it possible to “rewrite” the summary, given a proof of equivalence.

Do Notation

The wrapper on its own isn’t enough to save us from hundreds of lines of proofs. Once you do computation on its contents, you still need to join it up with its original proof of equivalence. In other words, you’ll need to drill into the return type of a function, find the place you used the relevant type variable, and apply the relevant proof from the type above. This can really clutter proofs. Instead, we can use Agda’s new support for do notation to try and get a cleaner notation for everything. Here’s a big block of code:

infixl 2 arg-syntax
record Arg {a} (Σ : Set a) ⦃ _ : σ Σ ⦄ (𝓂 : 𝓡) (f : 𝓡 → 𝓡) : Set (m ⊔ r ⊔ a) where
  constructor arg-syntax
  field
    ⟨f⟩ : Congruent₁ f
    arg : μ⟨ Σ ⟩≈ 𝓂
open Arg

syntax arg-syntax (λ sz → e₁) xs = xs [ e₁ ⟿ sz ]

infixl 1 _>>=_
_>>=_ : ∀ {a b} {Σ₁ : Set a} {Σ₂ : Set b} ⦃ _ : σ Σ₁ ⦄ ⦃ _ : σ Σ₂ ⦄ {𝓂 f}
      → Arg Σ₁ 𝓂 f
      → ((x : Σ₁) → ⦃ x≈ : μ x ≈ 𝓂 ⦄ → μ⟨ Σ₂ ⟩≈ f (μ x))
      → μ⟨ Σ₂ ⟩≈ f 𝓂
arg-syntax cng xs >>= k = k (𝓢 xs) ⦃ 𝒻 xs ⦄ ≈[ cng (𝒻 xs) ]

First, we define a wrapper for types parameterised by their summary, with a way to lift an underlying equality up into some expression f. The >>= operator just connects up all of the relevant bits. An example is what’s needed:

listToTree : ∀ {a} {Σ : Set a} ⦃ _ : σ Σ ⦄ → (xs : List Σ) → μ⟨ Tree Σ ⟩≈ μ xs
listToTree [] = empty ⇑
listToTree (x ∷ xs) = [ ℳ ↯ ]≈ do
  ys ← listToTree xs [ μ x ∙> s ⟿ s ]
  x ◂ ys

The first line is the base case, nothing interesting going on there. The second line begins the do-notation, but first applies [ ℳ ↯ ]≈: this calls the automated solver. The second line makes the recursive call, and with the syntax:

[ μ x ∙> s ⟿ s ]

It tells us where the size of the bound variable will end up in the outer expression.

Danielsson, Nils Anders. 2008. “Lightweight Semiformal Time Complexity Analysis for Purely Functional Data Structures.” In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 133–144. POPL ’08. New York, NY, USA: ACM. doi:10.1145/1328438.1328457.

Hinze, Ralf. 2001. “A Simple Implementation Technique for Priority Search Queues.” In Proceedings of the 2001 International Conference on Functional Programming, 110–121. ACM Press. doi:10.1145/507635.507650.

Hinze, Ralf, and Ross Paterson. 2006. “Finger Trees: A Simple General-purpose Data Structure.” Journal of Functional Programming 16 (2): 197–217.

Mu, Shin-Cheng, Yu-Hsi Chiang, and Yu-Han Lyu. 2016. “Queueing and Glueing for Optimal Partitioning (Functional Pearl).” In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, 158–167. ICFP 2016. New York, NY, USA: ACM. doi:10.1145/2951913.2951923.

Sozeau, Matthieu. 2007. “Program-ing Finger Trees in Coq.” In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, 13–24. ICFP ’07. New York, NY, USA: ACM. doi:10.1145/1291151.1291156.

A New Ring Solver for Agda

Donnacha Oisín Kidney — Fri, 25 Jan 2019 00:00:00 UT

Posted on January 25, 2019

Tags: Agda

I’m finally a the point where I feel like I can make the project I’ve been working on for the past few months public: A Ring Solver for Agda. The focus of the project is ergonomics and ease-of-use: hopefully the interface to the solver is simpler and more friendly than the one that’s already there. It can do step-by-step solutions (like Wolfram Alpha). It’s also asymptotically faster than the old solver (and actually faster! The usual optimizations you might apply don’t actually work here, so this bit definitely took the most work).

Anyway, this work is all for my undergrad final year project, but I’m hoping to submit it to a conference or something in the next few weeks.

A Binomial Urn

Donnacha Oisín Kidney — Tue, 15 Jan 2019 00:00:00 UT

Posted on January 15, 2019

Part 3 of a 3-part series on Balanced Folds

Tags: Haskell

When we started the series, we wanted to find a “better” fold: one that was more balanced than either foldl or foldr (in its placement of parentheses). Both of these are about as unbalanced as you can get:

>>> foldr (+) 0 [1,2,3]
1 + (2 + (3 + 0))
>>> foldl (+) 0 [1,2,3]
((0 + 1) + 2) + 3

The first better fold I found was Jon Fairbairn’s simple treeFold:

treeFold :: (a -> a -> a) -> a -> [a] -> a
treeFold f = go
  where
    go x [] = x
    go a  (b:l) = go (f a b) (pairMap l)
    pairMap (x:y:rest) = f x y : pairMap rest
    pairMap xs = xs
  
>>> treeFold (+) 0 [1,2,3]
(0 + 1) + (2 + 3)

Already this function was kind of magical: if your binary operator merges two sorted lists, foldr will give you insertion sort, whereas treeFold will give you merge sort; for summing floats, treeFold has a lower error growth than sum. By dividing up the work better, we were able to improve the characteristics of many algorithms automatically. We also saw that it could easily be made parallel:

parseq :: a -> b -> b
parseq a b =
    runST
        (bool (par a b) (seq a b) <$>
         unsafeIOToST (liftA2 (>) numSparks getNumCapabilities))

treeFoldParallel :: (a -> a -> a) -> a -> [a] -> a
treeFoldParallel f =
    treeFold
        (\l r ->
              r `parseq` (l `parseq` f l r))

In the next post, we saw how we could make the fold incremental, by using binary number representations for data structures. This let us do 2 things: it meant the fold was structurally terminating, so it would pass the termination checker (efficiently) in languages like Agda or Idris, and it meant we could write scanl using the fold. The scanl was also efficient: you could run the fold at any point in $\mathcal{O}(\log n)$ time, and work would be shared between subsequent runs. Effectively, this let us use it to solve greedy optimization problems. We also saw how it was effectively constructing an implicit binomial priority queue under the hood, and how it exploited laziness to get sharing.

I’ve gotten huge mileage out of this fold and the general ideas about it, and today I’m going to show one more use of it. We’re going to improve some of the asymptotics of the data structure presented in Lampropoulos, Spector-Zabusky, and Foner (2017).

A Random Urn

The paper opens with the problem:

Suppose you have an urn containing two red balls, four green balls, and three blue balls. If you take three balls out of the urn, what is the probability that two of them are green?

If you were to take just one ball out of the earn, calculating the associated probabilities would be easy. Once you get to the second, though, you have to update the previous probability based on what ball was removed. In other words, we need to be able to dynamically update the distribution.

Using lists, this would obviously become an $\mathcal{O}(n)$ operation. In the paper, an almost-perfect binary tree is used. This turns the operation into one that’s $\mathcal{O}(\log n)$ . The rest of the operations have the following complexities:

Operation	Complexity
`insert`	$\mathcal{O}(\log n)$
`remove`	$\mathcal{O}(\log n)$
`fromList`	$\mathcal{O}(n)$

As a quick spoiler, the improved version presented here has these complexities:

Operation	Complexity
`insert`	$\mathcal{O}(1)$
`remove`	$\mathcal{O}(\log n)$
`merge`	$\mathcal{O}(\log n)$
`fromList`	$\mathcal{O}(n)$

We add another operation (merge), which means that the new structure is viable as an instance of Alternative, Monad, and so on, making it an efficient monad for weighted backtracking search.

Priority Queues

The key thing to notice in the paper which will let us improve the structure is that what they’re designing is actually a priority queue. Well, a weird looking priority queue, but a priority queue nonetheless.

Think about it like a max-priority queue (pop returns the largest element first), with a degree of “randomization”. In other words, when you go to do a pop, all of the comparisons between the ordering keys (the weights in this case) sprinkles some randomness into the equation, meaning that instead of 1 < 2 returning True, it returns True $\frac{2}{3}$ of the time, and False the other $\frac{1}{3}$ .

This way of doing things means that not every priority queue is suitable: we want to run comparisons at pop time (not insert), so a binary heap (for instance) won’t do. At branches (non-leaves), the queue will only be allowed store summaries of the data, not the “max element”.

The one presented in the paper is something like a Braun priority queue: the $\mathcal{O}(n)$ fromList implementation is reminiscent of the one in Okasaki (1997).

So what priority queue can we choose to get us the desired efficiency? Why, a binomial one of course!

The Data Structure

The urn structure itself looks a lot like a binomial heap:

data Tree a
  = Tree
  { weight :: {-# UNPACK #-} !Word
  , branch :: Node a
  }

data Node a
  = Leaf a
  | Branch (Tree a) (Node a)

data Heap a
  = Nil
  | Cons {-# UNPACK #-} !Word (Tree a) (Heap a)
  
data Urn a =
    Urn {-# UNPACK #-} !Word
        !(Heap a)

By avoiding the usual Skip constructors you often see in a binomial heap we save a huge amount of space. Instead, we store the “number of zeroes before this bit”. Another thing to point out is that only left branches in the trees store their weight: the same optimization is made in the paper.

Insertion is not much different from insertion for a usual binomial priority queue, although we don’t need to do anything to merge the trees:

insertHeap :: Word -> a -> Heap a -> Heap a
insertHeap i' x' = go 0 (Tree i' (Leaf x'))
  where
    go !i x Nil = Cons i x Nil
    go !i x (Cons 0 y ys) = go (i+1) (mergeTree x y) ys
    go !i x (Cons j y ys) = Cons i x (Cons (j-1) y ys)

mergeTree :: Tree a -> Tree a -> Tree a
mergeTree xs ys =
  Tree
    (weight xs + weight ys)
    (Branch xs (branch ys))

insert :: Word -> a -> Urn a -> Urn a
insert i x (Urn w xs) = Urn (w+i) (insertHeap i x xs)

We could potentially get insertion from amortized $\mathcal{O}(1)$ to worst-case $\mathcal{O}(1)$ by using skew binary instead of binary (in fact I am almost sure it’s possible), but then I think we’d lose the efficient merge. I’ll leave exploring that for another day.

To get randomness, we’ll write a very simple class that encapsulates only what we need:

class Sample m where
    -- | Inclusive range
    inRange :: Word -> Word -> m Word

You can later instantiate this to whatever random monad you end up using. (The same approach was taken in the paper, although we only require Functor here, not Monad).

Sampling (with replacement) first randomly chooses a tree from the top-level list, and then we drill down into that tree with binary search.

sample :: (Functor m, Sample m) => Urn a -> Maybe (m a)
sample (Urn _ Nil) = Nothing
sample (Urn w' (Cons _ x' xs')) = Just (fmap (go x' xs') (inRange 0 (w' - 1)))
  where
    go x Nil !w = go' w (branch x)
    go x (Cons _ y ys) !w
      | w < weight x = go' w (branch x)
      | otherwise    = go y ys (w - weight x)
    go' !_ (Leaf x) = x
    go' !i (Branch xs ys)
      | i < weight xs = go' i (branch xs)
      | otherwise = go' (i - weight xs) ys

So we’re off to a good start, but remove is a complex operation. We take the same route taken in the paper: first, we perform an “uncons”-like operation, which pops out the last inserted element. Then, we randomly choose a point in the tree (using the same logic as in sample), and replace it with the popped element¹.

remove :: (Functor m, Sample m) => Urn a -> Maybe (m ((a, Word), Urn a))
remove (Urn w hp) = fmap go' (Heap.uninsert hp)
  where
    go' (vw,v,hp') = fmap (`go` hp') (inRange 0 (w-1))
      where
        go !_  Nil = ((v, vw), Urn 0 Nil)
        go !rw vs@(Cons i' x' xs')
          | rw < vw = ((v, vw), Urn (w - vw) vs)
          | otherwise = replace (rw - vw) i' x' xs'
            (\ys yw y -> ((y, yw), Urn (w - yw) ys))

        replace !rw i x Nil k = replaceTree rw x (\t -> k (Cons i t Nil))
        replace !rw i x xs@(Cons j y ys) k
          | rw < weight x = replaceTree rw x (\t -> k (Cons i t xs))
          | otherwise = replace (rw - weight x) j y ys (k . Cons i x)

        replaceTree !_  (Tree tw (Leaf x)) k = k (Tree vw (Leaf v)) tw x
        replaceTree !rw (Tree tw (Branch xs ys)) k
          | rw < weight xs = replaceTree rw xs
            (\t -> k (Tree (tw + (weight t - weight xs)) (Branch t ys)))
          | otherwise = replaceTree (rw - weight xs)
            (Tree (tw - weight xs) ys)
            (\t -> k (Tree (weight xs + weight t) (Branch xs (branch t))))

Merge is the same as on binomial heaps:

mergeHeap :: Heap a -> Heap a -> Heap a
mergeHeap Nil = id
mergeHeap (Cons i' x' xs') = merger i' x' xs'
  where
    merger !i x xs Nil = Cons i x xs
    merger !i x xs (Cons j y ys) = merge' i x xs j y ys

    merge' !i x xs !j y ys = case compare i j of
        LT -> Cons i x (merger (j-i-1) y ys xs)
        GT -> Cons j y (merger (i-j-1) x xs ys)
        EQ -> mergec (succ i) (mergeTree x y) xs ys

    mergec !p !t Nil = carryLonger p t
    mergec !p !t (Cons i x xs) = mergecr p t i x xs

    mergecr !p !t !i x xs Nil = carryLonger' p t i x xs
    mergecr !p !t !i x xs (Cons j y ys) = mergec' p t i x xs j y ys

    mergec' !p t !i x xs !j y ys = case compare i j of
      LT -> mergecr'' p t i x xs (j-i-1) y ys
      GT -> mergecr'' p t j y ys (i-j-1) x xs
      EQ -> Cons p t (mergec i (mergeTree x y) xs ys)

    mergecr'' !p !t  0 x xs !j y ys = mergecr (p+1) (mergeTree t x) j y ys xs
    mergecr'' !p !t !i x xs !j y ys = Cons p t (Cons (i-1) x (merger j y ys xs))

    carryLonger !i !t Nil = Cons i t Nil
    carryLonger !i !t (Cons j y ys) = carryLonger' i t j y ys

    carryLonger' !i !t  0 y ys = carryLonger (succ i) (mergeTree t y) ys
    carryLonger' !i !t !j y ys = Cons i t (Cons (j-1) y ys)

merge :: Urn a -> Urn a -> Urn a
merge (Urn i xs) (Urn j ys) = Urn (i+j) (mergeHeap xs ys)

Finger Trees

Again, the cleverness of all the tree folds is that they intelligently batch summarizing operations, allowing you to efficiently so prefix-scan-like operations that exploit sharing.

The bare-bones version just uses binary numbers: you can upgrade the cons operation to worst-case constant-time if you use skew binary. Are there other optimizations? Yes! What if we wanted to stick something on to the other end, for instance? What if we wanted to reverse?

If you figure out a way to do all these optimizations, and put them into one big data structure, you get the mother-of-all “batching” data structures: the finger tree. This is the basis for Haskell’s Data.Sequence, but it can also implement priority queues, urns (I’d imagine), fenwick-tree-like structures, and more.

Uses and Further Work

First and foremost, I should test the above implementations! I’m pretty confident the asymptotics are correct, but I’m certain the implementations have bugs.

The efficient merge is intriguing: it means that Urn could conceivably be Alternative, MonadPlus, etc. I have yet to see a use for that, but it’s interesting nonetheless! I’m constantly looking for a way to express something like Dijkstra’s algorithm algebraicly, using the usual Alternative combinators; I don’t know if this is related.

The other interesting point is that, for this to be an instance of Applicative, it would need some analogue for multiplication for the weights. I’m not sure what that should be.

This is inherently max-priority. It’s not obvious how to translate what we have into a min-priority queue version.

Finally, it might be worth trying out different priority queues (a pairing heap is very similar in structure to this). Also, we could rearrange the weights so that larger ones are higher in each tree: this might give a performance boost.

Lampropoulos, Leonidas, Antal Spector-Zabusky, and Kenneth Foner. 2017. “Ode on a random urn (functional pearl).” In, 26–37. ACM Press. doi:10.1145/3122955.3122959.

Okasaki, Chris. 1997. “Three Algorithms on Braun Trees.” Journal of Functional Programming 7 (6) (November): 661–666. doi:10.1017/S0956796897002876.

There’s one extra step I haven’t mentioned: we also must allow the first element (the last inserted) to be chosen, so we run the random-number generator once to check if that’s the element we want to choose.↩︎

Drawing Trees

Donnacha Oisín Kidney — Sun, 30 Dec 2018 00:00:00 UT

Posted on December 30, 2018

Tags: Haskell

For a bunch of algorithms it’s handy to get a quick-and-dirty visualization of a tree. Data.Tree has a tree-drawing function, but its output is too noisy for my taste, and so doesn’t really illustrate the underlying structure in a way I find helpful. This version uses the unicode box-drawing characters to give an output that’s midway between what is provided in Data.Tree and a full-blown SVG diagram. This makes it perfect for debugging tree-based algorithms while you’re writing them.

For the example tree in Wikipedia’s article on breadth-first search, it gives the following output:

     ┌─9
   ┌5┤
 ┌2┤ └10
 │ └6
1┼3
 │   ┌11
 │ ┌7┤
 └4┤ └12
   └8

module TreeDrawing where

import Data.Tree (Tree(..))
import Data.List (intercalate)

drawTree :: Tree String -> String
drawTree tr = (unlines . filter content . flatten) (foldr go undefined maxLengths withLength)
  where
    withLength = fmap (\x -> (length x, x)) tr
    maxLengths = lwe withLength (repeat 0)
    lwe (Node x xs) (q:qs) = max (fst x) q : foldr lwe qs xs
    
    content = any (`notElem` " │")
    flatten (ls,x,rs) = ls ++ [x] ++ rs
    mapZipper lf f rf (ls,x,rs) = (map lf ls, f x, map rf rs)
    toZipper xs = case splitAt (length xs `div` 2) xs of (ls,x:rs) -> (ls,x,rs)
    
    go m ls (Node (l,x) []) = ([],replicate (m-l) '─' ++ x,[])
    go m ls (Node (l,x) [y]) = mapZipper pad link pad (ls y)
      where
        padding = m + 1
        pad = (++) (replicate padding ' ')
        link z = replicate (m-l) '─' ++ x ++ "─" ++ z
    go m ls (Node (l,x') xs) = mapZipper pad link pad (toZipper (intercalate ["│"] ([ysh] ++ ysm ++ [ysl])))
      where 
        x = replicate (m-l) '─' ++ x'
        ys = map ls xs
        
        ysh = flatten (mapZipper (' ':) ('┌' :) ('│':) (head ys))
        ysl = flatten (mapZipper ('│':) ('└' :) (' ':) (last ys))
        ysm = map (flatten . mapZipper ('│':) ('├':) ('│':)) (init (tail ys))
        
        pad = (++) (replicate m ' ')
        
        link ('│':zs) = x ++ "┤" ++ zs
        link ('├':zs) = x ++ "┼" ++ zs
        link ('┌':zs) = x ++ "┬" ++ zs
        link ('└':zs) = x ++ "┴" ++ zs

liftAN

Donnacha Oisín Kidney — Sat, 29 Dec 2018 00:00:00 UT

Posted on December 29, 2018

Tags: Haskell

This function is now available on hackage.

There’s a family of functions in Control.Applicative which follow the pattern liftA2, liftA3, etc. Using some tricks from Richard Eisenberg’s thesis we can write them all at once.

{-# LANGUAGE DataKinds             #-}
{-# LANGUAGE TypeFamilies          #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances     #-}
{-# LANGUAGE FlexibleContexts      #-}

module Apply where

data N = Z | S N

type family AppFunc f n a where
  AppFunc f Z a = f a
  AppFunc f (S n) (a -> b) = f a -> AppFunc f n b

type family CountArgs f where
  CountArgs (_ -> b) = S (CountArgs b)
  CountArgs _ = Z

class (CountArgs a ~ n) => Applyable a n where
  apply :: Applicative f => f a -> AppFunc f (CountArgs a) a

instance (CountArgs a ~ Z) => Applyable a Z where
  apply = id
  {-# INLINE apply #-}

instance Applyable b n => Applyable (a -> b) (S n) where
  apply f x = apply (f <*> x)
  {-# INLINE apply #-}

-- | >>> lift (\x y z -> x ++ y ++ z) (Just "a") (Just "b") (Just "c")
-- Just "abc"
lift :: (Applyable a n, Applicative f) => (b -> a) -> (f b -> AppFunc f n a)
lift f x = apply (fmap f x)
{-# INLINE lift #-}

Eisenberg, Richard A. “Dependent Types in Haskell: Theory and Practice.” University of Pennsylvania, 2016.

Balancing Scans

Donnacha Oisín Kidney — Fri, 21 Dec 2018 00:00:00 UT

Posted on December 21, 2018

Part 2 of a 3-part series on Balanced Folds

Tags: Haskell, Agda

Previously I tried to figure out a way to fold lists in a more balanced way. Usually, when folding lists, you’ve got two choices for your folds, both of which are extremely unbalanced in one direction or another. Jon Fairbairn wrote a more balanced version, which looked something like this:

treeFold :: (a -> a -> a) -> a -> [a] -> a
treeFold f = go
  where
    go x [] = x
    go a (b:l) = go (f a b) (pairMap l)
    pairMap (x:y:rest) = f x y : pairMap rest
    pairMap xs = xs

Magical Speedups

The fold above is kind of magical: for a huge class of algorithms, it kind of “automatically” improves some factor of theirs from $\mathcal{O}(n)$ to $\mathcal{O}(\log n)$ . For instance: to sum a list of floats, foldl' (+) 0 will have an error growth of $\mathcal{O}(n)$ ; treeFold (+) 0, though, has an error rate of $\mathcal{O}(\log n)$ . Similarly, using the following function to merge two sorted lists:

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge (x:xs) ys = go x xs ys
  where
    go x xs [] = x : xs
    go x xs (y:ys)
      | x <= y    = x : go y ys xs
      | otherwise = y : go x xs ys

We get either insertion sort ( $\mathcal{O}(n^2)$ ) or merge sort ( $\mathcal{O}(n \log n)$ ) just depending on which fold you use.

foldr    merge [] . map pure -- n^2
treeFold merge [] . map pure -- n log(n)

I’ll give some more examples later, but effectively it gives us a better “divide” step in many divide and conquer algorithms.

Termination

As it was such a useful fold, and so integral to many tricky algorithms, I really wanted to have it available in Agda. Unfortunately, though, the functions (as defined above) aren’t structurally terminating, and there doesn’t look like there’s an obvious way to make it so. I tried to make well founded recursion work, but the proofs were ugly and slow.

However, we can use some structures from a previous post: the nested binary sequence, for instance. It has some extra nice properties: instead of nesting the types, we can just apply the combining function.

mutual
  data Tree {a} (A : Set a) : Set a where
    2^_×_+_ : ℕ → A → Node A → Tree A

  data Node {a} (A : Set a) : Set a where
    ⟨⟩  : Node A
    ⟨_⟩ : Tree A → Node A

module TreeFold {a} {A : Set a} (_*_ : A → A → A) where
  infixr 5 _⊛_ 2^_×_⊛_

  2^_×_⊛_ : ℕ → A → Tree A → Tree A
  2^ n × x ⊛ 2^ suc m × y + ys = 2^ n × x + ⟨ 2^ m × y + ys ⟩
  2^ n × x ⊛ 2^ zero  × y + ⟨⟩ = 2^ suc n × (x * y) + ⟨⟩
  2^ n × x ⊛ 2^ zero  × y + ⟨ ys ⟩ = 2^ suc n × (x * y) ⊛ ys

  _⊛_ : A → Tree A → Tree A
  _⊛_ = 2^ 0 ×_⊛_

  ⟦_⟧↓ : Tree A → A
  ⟦ 2^ _ × x + ⟨⟩ ⟧↓ = x
  ⟦ 2^ _ × x + ⟨ xs ⟩ ⟧↓ = x * ⟦ xs ⟧↓

  ⟦_⟧↑ : A → Tree A
  ⟦ x ⟧↑ = 2^ 0 × x + ⟨⟩

  ⦅_,_⦆ : A → List A → A
  ⦅ x , xs ⦆ = ⟦ foldr _⊛_ ⟦ x ⟧↑ xs ⟧↓

Alternatively, we can get $\mathcal{O}(1)$ cons with the skew array:

infixr 5 _⊛_
_⊛_ : A → Tree A → Tree A
x ⊛ 2^ n × y  + ⟨⟩ = 2^ 0 × x + ⟨ 2^ n × y + ⟨⟩ ⟩
x ⊛ 2^ n × y₁ + ⟨ 2^ 0     × y₂ + ys ⟩ = 2^ suc n × (x * (y₁ * y₂)) + ys
x ⊛ 2^ n × y₁ + ⟨ 2^ suc m × y₂ + ys ⟩ = 2^ 0 × x + ⟨ 2^ n × y₁ + ⟨ 2^ m × y₂ + ys ⟩ ⟩

Using this, a proper and efficient merge sort is very straightforward:

data Total {a r} {A : Set a} (_≤_ : A → A → Set r) (x y : A) : Set (a ⊔ r) where
  x≤y : ⦃ _ : x ≤ y ⦄ → Total _≤_ x y
  y≤x : ⦃ _ : y ≤ x ⦄ → Total _≤_ x y

module Sorting {a r}
               {A : Set a}
               {_≤_ : A → A → Set r}
               (_≤?_ : ∀ x y → Total _≤_ x y) where
  data [∙] : Set a where
    ⊥   : [∙]
    [_] : A → [∙]

  data _≥_ (x : A) : [∙] → Set (a ⊔ r) where
    instance ⌈_⌉ : ∀ {y} → y ≤ x → x ≥ [ y ]
    instance ⌊⊥⌋ : x ≥ ⊥

  infixr 5 _∷_
  data Ordered (b : [∙]) : Set (a ⊔ r) where
    []  : Ordered b
    _∷_ : ∀ x → ⦃ x≥b : x ≥ b ⦄ → (xs : Ordered [ x ]) → Ordered b

  _∪_ : ∀ {b} → Ordered b → Ordered b → Ordered b
  [] ∪ ys = ys
  (x ∷ xs) ∪ ys = ⟅ x ∹ xs ∪ ys ⟆
    where
    ⟅_∹_∪_⟆ : ∀ {b} → ∀ x ⦃ _ : x ≥ b ⦄ → Ordered [ x ] → Ordered b → Ordered b
    ⟅_∪_∹_⟆ : ∀ {b} → Ordered b → ∀ y ⦃ _ : y ≥ b ⦄ → Ordered [ y ] → Ordered b
    merge : ∀ {b} x y ⦃ _ : x ≥ b ⦄ ⦃ _ : y ≥ b ⦄
          → Total _≤_ x y
          → Ordered [ x ]
          → Ordered [ y ]
          → Ordered b

    ⟅ x ∹ xs ∪ [] ⟆ = x ∷ xs
    ⟅ x ∹ xs ∪ y ∷ ys ⟆ = merge x y (x ≤? y) xs ys
    ⟅ [] ∪ y ∹ ys ⟆ = y ∷ ys
    ⟅ x ∷ xs ∪ y ∹ ys ⟆ = merge x y (x ≤? y) xs ys

    merge x y x≤y xs ys = x ∷ ⟅ xs ∪ y ∹ ys ⟆
    merge x y y≤x xs ys = y ∷ ⟅ x ∹ xs ∪ ys ⟆


  open TreeFold

  sort : List A → Ordered ⊥
  sort = ⦅ _∪_ , [] ⦆ ∘ map (_∷ [])

Validity

It would be nice if we could verify these optimizated versions of folds. Luckily, by writing them using foldr, we’ve stumbled into well-trodden ground: the foldr fusion law. It states that if you have some transformation $f$ , and two binary operators $\oplus$ and $\otimes$ , then:

$\begin{align} f (x \oplus y) &&=\;& x \otimes f y \\ \implies f \circ \text{foldr} \oplus e &&=\;& \text{foldr} \otimes (f e) \end{align}$

This fits right in with the function we used above. $f$ is ⟦_⟧↓, $\oplus$ is _⊛_, and $\otimes$ is whatever combining function was passed in. Let’s prove the foldr fusion law, then, before we go any further.

module Proofs
  {a r}
  {A : Set a}
  {R : Rel A r}
  where

  infix 4 _≈_
  _≈_ = R

  open import Algebra.FunctionProperties _≈_

  foldr-universal : Transitive _≈_
                  → ∀ {b} {B : Set b} (h : List B → A) f e
                  → ∀[ f ⊢ Congruent₁ ]
                  → (h [] ≈ e)
                  → (∀ x xs → h (x ∷ xs) ≈ f x (h xs))
                  → ∀ xs → h xs ≈ foldr f e xs
  foldr-universal _○_ h f e f⟨_⟩ ⇒[] ⇒_∷_ [] = ⇒[]
  foldr-universal _○_ h f e f⟨_⟩ ⇒[] ⇒_∷_ (x ∷ xs) =
    (⇒ x ∷ xs) ○ f⟨ foldr-universal _○_ h f e f⟨_⟩ ⇒[] ⇒_∷_ xs ⟩

  foldr-fusion : Transitive _≈_
               → Reflexive _≈_
               → ∀ {b c} {B : Set b} {C : Set c} (f : C → A) {_⊕_ : B → C → C} {_⊗_ : B → A → A} e
               → ∀[ _⊗_ ⊢ Congruent₁ ]
               → (∀ x y → f (x ⊕ y) ≈ x ⊗ f y)
               → ∀ xs → f (foldr _⊕_ e xs) ≈ foldr _⊗_ (f e) xs
  foldr-fusion _○_ ∎ h {f} {g} e g⟨_⟩ fuse =
    foldr-universal _○_ (h ∘ foldr f e) g (h e) g⟨_⟩ ∎ (λ x xs → fuse x (foldr f e xs))

We’re not using the proofs in Agda’s standard library because these are tied to propositional equality. In other words, instead of using an abstract binary relation, they prove things over actual equality. That’s all well and good, but as you can see above, we don’t need propositional equality: we don’t even need the relation to be an equivalence, we just need transitivity and reflexivity.

After that, we can state precisely what correspondence the tree fold has, and under what conditions it does the same things as a fold:

module _ {_*_ : A → A → A} where
  open TreeFold _*_

  treeFoldHom : Transitive _≈_
              → Reflexive _≈_
              → Associative _*_
              → RightCongruent _*_
              → ∀ x xs
              → ⦅ x , xs ⦆ ≈ foldr _*_ x xs
  treeFoldHom _○_ ∎ assoc *⟨_⟩ b = foldr-fusion _○_ ∎ ⟦_⟧↓ ⟦ b ⟧↑ *⟨_⟩ (⊛-hom zero)
    where
    ⊛-hom : ∀ n x xs → ⟦ 2^ n × x ⊛ xs ⟧↓ ≈ x * ⟦ xs ⟧↓
    ⊛-hom n x (2^ suc m × y + ⟨⟩    ) = ∎
    ⊛-hom n x (2^ suc m × y + ⟨ ys ⟩) = ∎
    ⊛-hom n x (2^ zero  × y + ⟨⟩    ) = ∎
    ⊛-hom n x (2^ zero  × y + ⟨ ys ⟩) = ⊛-hom (suc n) (x * y) ys ○ assoc x y ⟦ ys ⟧↓

“Implicit” Data Structures

Consider the following implementation of the tree above in Haskell:

type Tree a = [(Int,a)]

cons :: (a -> a -> a) -> a -> Tree a -> Tree a
cons (*) = cons' 0 
  where
    cons' n x [] = [(n,x)]
    cons' n x ((0,y):ys) = cons' (n+1) (x * y) ys
    cons' n x ((m,y):ys) = (n,x) : (m-1,y) : ys

The cons function “increments” that list as if it were the bits of a binary number. Now, consider using the merge function from above, in a pattern like this:

f = foldr (cons merge . pure) []

What does f build? A list of lists, right?

Kind of. That’s what’s built in terms of the observable, but what’s actually stored in memory us a bunch of thunks. The shape of those is what I’m interested in. We can try and see what they look like by using a data structure that doesn’t force on merge:

data Tree a = Leaf a | Tree a :*: Tree a

f = foldr (cons (:*:) . Leaf) []

Using a handy tree-drawing function, we can see what f [1..13] looks like:

[(0,*),(1,*),(0,*)]
    └1    │ ┌2  │  ┌6
          │┌┤   │ ┌┤
          ││└3  │ │└7
          └┤    │┌┤
           │┌4  │││┌8
           └┤   ││└┤
            └5  ││ └9
                └┤
                 │ ┌10
                 │┌┤
                 ││└11
                 └┤
                  │┌12
                  └┤
                   └13

It’s a binomial heap! It’s a list of trees, each one contains $2^n$ elements. But they’re not in heap order, you say? Well, as a matter of fact, they are. It just hasn’t been evaluated yet. Once we force—say—the first element, the rest will shuffle themselves into a tree of thunks.

This illustrates a pretty interesting similarity between binomial heaps and merge sort. Performance-wise, though, there’s another interesting property: the thunks stay thunked. In other words, if we do a merge sort via:

sort = foldr (merge . snd) [] . foldr (cons merge . pure) []

We could instead freeze the fold, and look at it at every point:

sortPrefixes = map (foldr (merge . snd) []) . scanl (flip (cons merge . pure)) []
>>> [[],[1],[1,4],[1,2,4],[1,2,3,4],[1,2,3,4,5]]

And sortPrefixes is only $\mathcal{O}(n^2)$ (rather than $\mathcal{O}(n^2 \log n)$ ). I confess I don’t know of a use for sorted prefixes, but it should illustrate the general idea: we get a pretty decent batching of operations, with the ability to freeze at any point in time. The other nice property (which I mentioned in the last post) is that any of the tree folds are extremely parallel.

Random Shuffles

There’s a great article on shuffling in Haskell which provides an $\mathcal{O}(n \log n)$ implementation of a perfect random shuffle. Unfortunately, the Fisher-Yates shuffle isn’t applicable in a pure functional setting, so you have to be a little cleverer.

The first implementation most people jump to (certainly the one I thought of) is to assign everything in the sequence a random number, and then sort according to that number. Perhaps surprisingly, this isn’t perfectly random! It’s a little weird, but the example in the article explains it well: basically, for $n$ elements, your random numbers will have $n^n$ possible values, but the output of the sort will have $n!$ possible values. Since they don’t divide into each other evenly, you’re going to have some extra weight on some permutations, and less on others.

Instead, we can generate a random factoradic number. A factoradic number is one where the $n$ th digit is in base $n$ . Because of this, a factoradic number with $n$ digits has $n!$ possible values: exactly what we want.

In the article, the digits of the number are used to pop values from a binary tree. Because the last digit will have $n$ possible values, and the second last $n-1$ , and so on, you can keep popping without hitting an empty tree.

This has the correct time complexity— $\mathcal{O}(n \log n)$ —but there’s a lot of overhead. Building the tree, then indexing into it, the rebuilding after each pop, etc.

We’d like to just sort the list, according to the indices. The problem is that the indices are relative: if you want to cons something onto the list, you have to increment the rest of the indices, as they’ve all shifted right by one.

What we’ll do instead is use the indices as gaps. Our merge function looks like the following:

merge [] ys = ys
merge xs [] = xs
merge ((x,i):xs) ((y,j):ys)
  | i <= j    = (x,i) : merge xs ((y,j-i):ys)
  | otherwise = (y,j) : merge ((x,i-j-1):xs) ys

With that, and the same cons as above, we get a very simple random shuffle algorithm:

shuffle xs = map fst
           . foldr (merge . snd) []
           . foldr f (const []) xs
  where
    f x xs (i:is) = cons merge [(x,i)] (xs is)

The other interesting thing about this algorithm is that it can use Peano numbers with taking too much of a performance hit:

merge : ∀ {a} {A : Set a} → List (A × ℕ) → List (A × ℕ) → List (A × ℕ)
merge xs [] = xs
merge {A = A} xs ((y , j) ∷ ys) = go-r xs y j ys
  where
  go-l : A → ℕ → List (A × ℕ) → List (A × ℕ) → List (A × ℕ)
  go-r : List (A × ℕ) → A → ℕ → List (A × ℕ) → List (A × ℕ)
  go : ℕ → ℕ → A → ℕ → List (A × ℕ) → A → ℕ → List (A × ℕ) → List (A × ℕ)

  go i     zero   x i′ xs y j′ ys = (y , j′) ∷ go-l x i xs ys
  go zero (suc j) x i′ xs y j′ ys = (x , i′) ∷ go-r xs y j ys
  go (suc i) (suc j) = go i j

  go-l x i xs [] = (x , i) ∷ xs
  go-l x i xs ((y , j) ∷ ys) = go i j x i xs y j ys

  go-r [] y j ys = (y , j) ∷ ys
  go-r ((x , i) ∷ xs) y j ys = go i j x i xs y j ys

shuffle : ∀ {a} {A : Set a} → List A → List ℕ → List A
shuffle {a} {A} xs i = map proj₁ (⦅ [] , zip-inds xs i ⦆)
  where
  open TreeFold {a} {List (A × ℕ)} merge

  zip-inds : List A → List ℕ → List (List (A × ℕ))
  zip-inds [] inds = []
  zip-inds (x ∷ xs) [] = ((x , 0) ∷ []) ∷ zip-inds xs []
  zip-inds (x ∷ xs) (i ∷ inds) = ((x , i) ∷ []) ∷ zip-inds xs inds

I don’t know exactly what the complexity of this is, but I think it should be better than the usual approach of popping from a vector.

Future Stuff

This is just a collection of random thoughts for now, but I intend to work on using these folds to see if there are any other algorithms they can be useful for. In particular, I think I can write a version of Data.List.permutations which benefits from sharing. And I’m interested in using the implicit binomial heap for some search problems.

Pure and Lazy Breadth-First Traversals of Graphs in Haskell

Donnacha Oisín Kidney — Tue, 18 Dec 2018 00:00:00 UT

Posted on December 18, 2018

Part 5 of a 10-part series on Breadth-First Traversals

Tags: Haskell

Today, I’m going to look at extending the previous breadth-first traversal algorithms to arbitrary graphs (rather than just trees). Graphs with cycles are notoriously cumbersome in functional languages, so this actually proves to be a little trickier than I thought it would be. First, a quick recap.

3 Ways to Breadth-First Search

So far, we have three major ways to traverse a tree in breadth-first order. The first is the simplest, and the fastest:

bfe :: Tree a -> [a]
bfe r = f r b []
  where
    f (Node x xs) fw bw = x : fw (xs : bw)
  
    b [] = []
    b qs = foldl (foldr f) b qs []

Given a tree like the following:

   ┌4
 ┌2┤
 │ │ ┌8
 │ └5┤
 │   └9
1┤
 │   ┌10
 │ ┌6┘
 └3┤
   └7

We get:

>>> bfe tree
[1,2,3,4,5,6,7,8,9,10]

It also demonstrates a theme that will run through this post: lists are the only visible data structure (other than the tree, of course). However, we are carefully batching the operations on those lists (the foldl is effectively a reverse) so that they have the same complexity as if we had used a queue. In actual fact, when lists are used this way, they are queues: “corecursive” ones (Allison 2006; Smith 2009).

The next two functions perform a breadth-first traversal “level-wise”: instead of just returning all the nodes of the tree, we get them delimited by how far they are from the root.

lwe :: Tree a -> [[a]]
lwe r = f b r [] []
  where
    f k (Node x xs) ls qs = k (x : ls) (xs : qs)

    b _ [] = []
    b k qs = k : foldl (foldl f) b qs [] []

>>> lwe tree
[[1],[2,3],[4,5,6,7],[8,9,10]]

The above function is very clearly related to the bfe function: we just add another queue (representing the current level), and work from there.

The third of these functions also does level-wise enumeration, but in a direct style (without continuations).

lwe :: Tree a -> [[a]]
lwe r = f r []
  where
    f (Node x xs) (q:qs) = (x:q) : foldr f qs xs
    f (Node x xs) []     = [x]   : foldr f [] xs

There are more techniques out there than just these three (including the one in Data.Tree), but these are my favorite, and they’re what I’ll be looking at today.

Graphs and Purity

Functional programming in general excels at working with trees and similar data structures. Graphs, though, are trickier. There’s been a lot of recent work in improving the situation (Mokhov 2017), but I’m going to keep it simple today: a graph is just a function.

type Graph a = a -> [a]

So the tree from above could be represented as:

graph 1 = [2,3]
graph 2 = [4,5]
graph 3 = [6,7]
graph 5 = [8,9]
graph 6 = [10]
graph _ = []

As it happens, all of the algorithms that follow will work on graphs represented as rose trees (or represented any way, really).

So let’s fire up our first traversal!

bfs :: Graph a -> Graph a
bfs g r = f r b []
  where
    f x fw bw = x : fw (g x : bw)
  
    b [] = []
    b qs = foldl (foldr f) b qs []
    
>>> bfs graph 1
[1,2,3,4,5,6,7,8,9,10]

Unfortunately, this won’t handle cycles properly:

graph 1 = [2,3]
graph 2 = [4,5,1]
graph 3 = [6,7]
graph 5 = [8,9]
graph 6 = [10]
graph _ = []

>>> bfs graph 1
[1,2,3,4,5,1,6,7,8,9,2,3,10,4,5,1,6,7,8,9,2,3,10,4,5,1,6,7,8,9,2,3,10,4,5...

We need a way to mark off what we’ve already seen. The following isn’t good enough, also:

>>> nub (bfs graph 1)
[1,2,3,4,5,6,7,8,9,10...

It will hang without finishing the list. The solution is to mark off nodes as we find them, with some set structure:

bfs :: Ord a => Graph a -> Graph a
bfs g ts = f ts b [] Set.empty
  where
    f x fw bw s
      | Set.member x s = fw bw s
      | otherwise      = x : fw (g x : bw) (Set.insert x s)

    b [] _ = []
    b qs s = foldl (foldr f) b qs [] s

>>> bfs graph 1
[1,2,3,4,5,6,7,8,9,10]

The levelwise algorithm is similar:

lws :: Ord a => Graph a -> a -> [[a]] 
lws g r = f b r [] [] Set.empty
  where
    f k x ls qs s
      | Set.member x s = k ls qs s
      | otherwise = k (x : ls) (g x : qs) (Set.insert x s)

    b _ [] _ = []
    b k qs s = k : foldl (foldl f) b qs [] [] s

Tying the Knot

The other levelwise algorithm doesn’t translate across so easily. To see why, let’s look at the version without cycle detection:

lws :: Graph a -> a -> [[a]]
lws g r = f r []
  where
    f x (q:qs) = (x:q) : foldr f qs (g x)
    f x []     = [x]   : foldr f [] (g x)

The recursive call is being made depth-first, not breadth-first. The result, of course, is breadth-first, but that’s only because the recursive call zips as it goes.

Just looking at the fourth line for now:

f x (q:qs) = (x:q) : foldr f qs (g x)

We want whatever process built up that q to be denied access to x. The following doesn’t work:

f x (q:qs) = (x:filter (x/=) q) : foldr f qs (g x)

As well as being terribly slow, the later computation can diverge when it finds a cycle, and filtering won’t do anything to help that.

The solution is to “tie the knot”. We basically do two passes over the data: one to build up the “seen so far” list, and then another to do the actual search. The trick is to do both of these passes at once, and feed the result back into the demanding computation.

lws g r = takeWhile (not.null) (map fst (fix (f r . push)))
  where
    push xs = ([],Set.empty) : [ ([],seen) | (_,seen) <- xs ]
    f x q@((l,s):qs)
      | Set.member x s = q
      | otherwise = (x:l, Set.insert x s) : foldr f qs (g x)

And it works!

I got the idea for this trick from the appendix of Okasaki (2000). There’s something similar in Kiselyov (2002).

References

Allison, Lloyd. 2006. “Circular Programs and Self-Referential Structures.” Software: Practice and Experience 19 (2) (October): 99–109. doi:10.1002/spe.4380190202.

Kiselyov, Oleg. 2002. “Pure-functional transformations of cyclic graphs and the Credit Card Transform.” http://okmij.org/ftp/Haskell/AlgorithmsH.html#ccard-transform.

Mokhov, Andrey. 2017. “Algebraic Graphs with Class (Functional Pearl).” In Proceedings of the 10th ACM SIGPLAN International Symposium on Haskell, 2–13. Haskell 2017. New York, NY, USA: ACM. doi:10.1145/3122955.3122956.

Okasaki, Chris. 2000. “Breadth-first Numbering: Lessons from a Small Exercise in Algorithm Design.” In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, 131–136. ICFP ’00. New York, NY, USA: ACM. doi:10.1145/351240.351253.

Smith, Leon P. 2009. “Lloyd Allison’s Corecursive Queues: Why Continuations Matter.” The Monad.Reader 14 (14) (July): 28.

Prime Sieves in Agda

Donnacha Oisín Kidney — Fri, 14 Dec 2018 00:00:00 UT

Posted on December 14, 2018

Part 2 of a 2-part series on Prime Sieves

Tags: Agda

Prime numbers in Agda are slow. First, they’re Peano-based, so a huge chunk of optimizations we might make in other languages are out of the window. Second, we really often want to prove that they’re prime, so the generation code has to carry verification logic with it (I won’t do that today, though). And third, as always in Agda, you have to convince the compiler of termination. With all of that in mind, let’s try and write a (very slow, very basic) prime sieve in Agda.

First, we can make an “array” of numbers that we cross off as we go.

primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ Fin.suc))
  where
  cross-off : Fin _ → List (Maybe (Fin _)) → List (Maybe (Fin _))

  sieve : List (Maybe (Fin _)) → List (Fin _)
  sieve [] = []
  sieve (nothing ∷ xs) =         sieve xs
  sieve (just x  ∷ xs) = suc x ∷ sieve (cross-off x xs)

  cross-off p fs = foldr f (const []) fs p
    where
    B = ∀ {i} → Fin i → List (Maybe (Fin (2 + m)))

    f : Maybe (Fin (2 + m)) → B → B
    f _ xs zero    = nothing ∷ xs p
    f x xs (suc y) = x       ∷ xs y

Very simple so far: we run through the list, filtering out the multiples of each prime as we see it. Unfortunately, this won’t pass the termination checker. This recursive call to sieve is the problem:

sieve (just x ∷ xs) = suc x ∷ sieve (cross-off x xs)

Agda finds if a function is terminating by checking that at least one argument gets (structurally) smaller on every recursive call. sieve only takes one argument (the input list), so that’s the one that needs to get smaller. In the line above, if we replaced it with the following:

sieve (just x ∷ xs) = suc x ∷ sieve xs

We’d be good to go: xs is definitely smaller than (just x ∷ xs). cross-off x xs, though? The thing is, cross-off returns a list of the same length that it’s given. But the function call is opaque: Agda can’t automatically see the fact that the length stays the same. Reaching for a proof here is the wrong move, though: you can get all of the same benefit by switching out the list for a length-indexed vector.

primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ Fin.suc))
  where
  cross-off : ∀ {n} → Fin _ → Vec (Maybe _) n → Vec (Maybe _) n

  sieve : ∀ {n} →  Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
  sieve [] = []
  sieve (nothing ∷ xs) =         sieve xs
  sieve (just x  ∷ xs) = suc x ∷ sieve (cross-off x xs)

  cross-off p fs = foldr B f (const []) fs p
    where
    B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n

    f : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
    f _ xs zero    = nothing ∷ xs p
    f x xs (suc y) = x       ∷ xs y

Actually, my explanation above is a little bit of a lie. Often, the way I think about dependently-typed programs has a lot to do with my intuition for “proofs” and so on. But this leads you down the wrong path (and it’s why writing a proof that cross-off returns a list of the same length is the wrong move).

The actual termination checking algorithm is very simple, albeit strict: the argument passed recursively must be structurally smaller. That’s it. Basically, the recursive argument has to be contained in one of the arguments passed. It has nothing to do with Agda “seeing” inside the function cross-off or anything like that. What we’ve done above (to make it terminate) is add another argument to the function: the length of the vector. The argument is implicit, but if we were to make it explicit in the recursive call:

sieve {suc n} (just x  ∷ xs) = suc x ∷ sieve {n} (cross-off x xs)

We can see that it does indeed get structurally smaller.

Adding the Squaring Optimization

A simple improvement we should be able to make is stopping once we hit the square root of the limit. Since we don’t want to be squaring as we go, we’ll use the following identity:

$(n + 1)^2 = n^2 + 2n + 1$

to figure out the square of the next number from the previous. In fact, we’ll just pass in the limit, and reduce it by $2n + 1$ each time, until it reaches zero:

primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
  where
  cross-off : ∀ {n} → ℕ → Vec (Maybe _) n → Vec (Maybe _) n

  sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
  sieve _ zero = List.mapMaybe id ∘ Vec.toList
  sieve _ (suc _) [] = []
  sieve i (suc l) (nothing ∷ xs) =     sieve (suc i) (l ∸ i ∸ i) xs
  sieve i (suc l) (just x  ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (cross-off i xs)

  cross-off p fs = Vec.foldr B f (const []) fs p
    where
      B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n

      f : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
      f _ xs zero    = nothing ∷ xs p
      f x xs (suc y) = x       ∷ xs y

Finding Prime Factors

A slight variation on the code above (the first version) will give us the prime factors of a number:

primeFactors : ∀ n → List (Fin n)
primeFactors zero = []
primeFactors (suc zero) = []
primeFactors (suc (suc zero)) = []
primeFactors (suc (suc (suc m))) = sieve (Vec.tabulate (just ∘ Fin.suc))
  where
  sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
  sieve [] = []
  sieve (nothing ∷ xs) = sieve xs
  sieve (just x  ∷ xs) = Vec.foldr B remove b xs sieve x
    where
    B = λ n → ∀ {i}
            → (Vec (Maybe (Fin (2 + m))) n
            → List (Fin (3 + m)))
            → Fin i
            → List (Fin (3 + m))

    b : B 0
    b k zero    = suc x ∷ k []
    b k (suc _) =         k []

    remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
    remove y ys k zero    = ys (k ∘ (nothing ∷_)) x
    remove y ys k (suc j) = ys (k ∘ (y ∷_)) j

Adding the squaring optimization complicates things significantly:

primeFactors : ∀ n → List (Fin n)
primeFactors zero = []
primeFactors (suc zero) = []
primeFactors (suc (suc zero)) = []
primeFactors (suc (suc (suc m))) = sqr (suc m) m suc sieve
  where
  _2F-_ : ∀ {n} → ℕ → Fin n → ℕ
  x           2F- zero = x
  zero        2F- suc y = zero
  suc zero    2F- suc y = zero
  suc (suc x) 2F- suc y = x 2F- y

  sqr : ∀ n
      → ℕ
      → (Fin n → Fin (2 + m))
      → (∀ {i} → Vec (Maybe (Fin (2 + m))) i → ℕ → List (Fin (3 + m)))
      → List (Fin (3 + m))
  sqr n       zero    f k = k [] n
  sqr zero    (suc l) f k = k [] zero
  sqr (suc n) (suc l) f k =
    let x = f zero
    in sqr n (l 2F- x) (f ∘ suc) (k ∘ (just x ∷_))

  sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → ℕ → List (Fin (3 + m))
  sieve xs′ i = go xs′
    where
    go : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
    go [] = []
    go (nothing ∷ xs) = go xs
    go (just x  ∷ xs) = Vec.foldr B remove (b i) xs x go
      where
      B = λ n → ∀ {i}
              → Fin i
              → (Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m)))
              → List (Fin (3 + m))

      b : ℕ → B 0
      b zero    zero    k = suc x ∷ k []
      b zero    (suc y) k = k []
      b (suc n) zero    k = b n x k
      b (suc n) (suc y) k = b n y k

      remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
      remove y ys zero    k = ys x (k ∘ (nothing ∷_))
      remove y ys (suc j) k = ys j (k ∘ (y ∷_))

Infinitude

The above sieve aren’t “true” in that each remove is linear, so the performance is $\mathcal{O}(n^2)$ overall. This is the same problem we ran into with the naive infinite sieve in Haskell.

Since it bears such a similarity to the infinite sieve, we have to ask: can this sieve be infinite? Agda supports a notion of infinite data, so it would seem like it:

infixr 5 _◂_
record Stream (A : Set) : Set where
  constructor _◂_
  coinductive
  field
    head : A
    tail : Stream A
open Stream

primes : Stream ℕ
primes = sieve 1 nats
  where
  nats : Stream ℕ
  head nats = 0
  tail nats = nats

  sieve : ℕ → Stream ℕ → Stream ℕ
  head (sieve i xs) = suc i
  tail (sieve i xs) = remove i (head xs) (tail xs) (sieve ∘ suc ∘ (_+ i))
    where
    remove : ℕ → ℕ → Stream ℕ → (ℕ → Stream ℕ → Stream ℕ) → Stream ℕ
    remove zero zero zs       k = remove i (head zs) (tail zs) (k ∘ suc)
    remove zero (suc z) zs    k = remove i z zs (k ∘ suc)
    remove (suc y) zero zs    k = k zero (remove y (head zs) (tail zs) _◂_)
    remove (suc y) (suc z) zs k = remove y z zs (k ∘ suc)

But this won’t pass the termination checker. What we actually need to prove to do so is that there are infinitely many primes: a nontrivial task in Agda.

Keeping Formal Verification in Bounds

Donnacha Oisín Kidney — Tue, 20 Nov 2018 00:00:00 UT

Posted on November 20, 2018

Tags: Haskell, Agda

One of the favorite pastimes of both Haskell and Agda programmers alike is verifying data structures. Among my favorite examples are Red-Black trees (Might 2015; Weirich 2014, verified for balance), perfect binary trees (Hinze 1999), square matrices (Okasaki 1999a), search trees (McBride 2014, verified for balance and order), and binomial heaps (Hinze 1998, verified for structure).

There are many ways to verify data structures. One technique which has had recent massive success is to convert Haskell code to Coq, and then verify the Coq translation: this was the route taken by Breitner et al. (2018) to verify Set and IntSet in containers (a mammoth achievement, in my opinion).

This approach has some obvious advantages: you separate implementation from testing (which is usually a good idea), and your verification language can be different from your implementation language, with each tailored towards its particular domain.

LiquidHaskell (Bakst et al. 2018) (and other tools like it) adds an extra type system to Haskell tailor-made for verification. The added type system (refinement types) is more automated (the typechecker uses Z3), more suited for “invariant”-like things (it supports subtyping), and has a bunch of domain-specific built-ins (reasoning about sets, equations, etc.). I’d encourage anyone who hasn’t used it to give it a try: especially if you’re experienced writing any kind of proof in a language like Agda or Idris, LiquidHaskell proofs are shockingly simple and easy.

What I’m going to focus on today, though, is writing correct-by-construction data structures, using Haskell and Agda’s own type systems. In particular, I’m going to look at how to write fast verification. In the other two approaches, we don’t really care about the “speed” of the proofs: sure, it’s nice to speed up compilation and so on, but we don’t have to worry about our implementation suffering at runtime because of some complex proof. When writing correct-by-construction code, though, our task is doubly hard: we now have to worry about the time complexity of both the implementation and the proofs.

In this post, I’m going to demonstrate some techniques to write proofs that stay within the complexity bounds of the algorithms they’re verifying (without cheating!). Along the way I’m going to verify some data structures I haven’t seen verified before (a skew-binary random-access list).

Technique 1: Start With an Unverified Implementation, then Index

To demonstrate the first two techniques, we’re going to write a type for modular arithmetic. For a more tactile metaphor, think of the flip clock:

Each digit can be incremented $n$ times, where $n$ is whatever base you’re using (12 for our flip-clock above). Once you hit the limit, it flips the next digit along. We’ll start with just one digit, and then just string them together to get our full type. That in mind, our “digit” type has two requirements:

It should be incrementable.
Once it hits its limit, it should flip back to zero, and let us know that a flip was performed.

Anyone who’s used a little Agda or Idris will be familiar with the Fin type:

data Fin : ℕ → Set where
  zero : {n : ℕ} → Fin (suc n)
  suc  : {n : ℕ} → Fin n → Fin (suc n)

Fin n is the standard way to encode “numbers smaller than n”. However, for digits they’re entirely unsuitable: since the limit parameter changes on successor, the kind of increment we want is $\mathcal{O}(n)$ :

try-suc : ∀ {n} → Fin n → Maybe (Fin n)
try-suc (suc x) = Maybe.map suc (try-suc x)
try-suc {suc n} zero with n
... | zero = nothing
... | suc _ = just (suc zero)

suc-flip : ∀ {n} → Fin n → Fin n × Bool
suc-flip {suc n} x = maybe (_, false) (zero , true) (try-suc x)
suc-flip {zero} ()

If we keep going down this path with proofs in mind, we might next look at the various $\leq$ proofs in the Agda standard library (here, here, and here), and see if we can we can wrangle them into doing what we want.

For me, though, this wasn’t a fruitful approach. Instead, we’ll try and think of how we’d do this without proving anything, and then see if there’s any place in the resulting data structure we can hang some proof.

So, in an unproven way, let’s start with some numbers. Since we’re going to be incrementing, they’d better be unary:

data ℕ : Set where
  zero : ℕ
  suc : ℕ → ℕ

And then, for the “flippable” type, we’ll just store the limit alongside the value:

record Flipper : Set where
  constructor _&_
  field
    val : ℕ
    lim : ℕ

We’re not there yet: to check if we’ve gone over the limit, we’ll still have to compare val and lim. Hopefully you can guess the optimization we’ll make: instead of storing the limit, we’ll store the space left:

record Flipper : Set where
  constructor _&_
  field
    space : ℕ
    val   : ℕ

And we get our flip function:

suc-flip : Flipper → Flipper × Bool
suc-flip (zero  & n) = (suc n & zero ), true
suc-flip (suc m & n) = (m     & suc n), false

When there’s no space left, the digit must be maximal (9 in decimal, for instance), so it’ll be one less than the base. That lets us stick it in for the base, rather than recalculating. In the other case, we just take one from the space left, and add it to the value.

So, to “prove” this implementation, we might first reach for an equality proof that val + space is equal to your base. Don’t! Both val and space are inductive structures, which could be giving us information on every application of suc! Let’s set our sights on val and see how we can hang our proofs off of it.

We’re going to upgrade our Peano number with some information, which means that our resulting type is going to look an awful lot like a Peano number. In other words, two cases: zero and suc.

data Val _ : Set where
  zero-case : Val _
  suc-case  : Val _ → Val _

For the suc-case, remember we only want to be allowed to increment it when the space left is more than zero. So let’s encode it:

data Val _ : ℕ → Set where
  zero-case : Val _
  suc-case  : ∀ {space} → Val _ (suc space) → Val _ space

And for the zero-case, the space left is just the base. So let’s stick the base into the type as well:

data Val (base : ℕ) : ℕ → Set where
  zero-case : Val base base
  suc-case  : ∀ {space} → Val base (suc space) → Val base space

(We’ve changed around the way “base” works: it’s now one smaller. So to encode base-10 you’d have Val 9 space. You can get back to the other encoding with a simple wrapper, this way just makes things slightly easier from now on).

Finally, our flipper:

record Flipper (base : ℕ) : Set where
  constructor _&_
  field
    space : ℕ
    val : Val base space

suc-flip : ∀ {n} → Flipper n → Flipper n × Bool
suc-flip (zero  & m) = (_ &  zero-case) , true
suc-flip (suc n & m) = (n & suc-case m) , false

Great! Everything works.

You may have noticed that the Val type is actually a proof for $\geq$ in disguise:

data _≥_ (m : ℕ) : ℕ → Set where
  m≥m : m ≥ m
  m≥p : ∀ {n} → m ≥ suc n → m ≥ n

And the flipper itself is just an existential in disguise:

Flipper : ℕ → Set
Flipper n = ∃ (n ≥_)

suc-flip : ∀ {n} → Flipper n → Flipper n × Bool
suc-flip (zero  , m) = (_ , m≥m  ), true
suc-flip (suc n , m) = (n , m≥p m), false

Hopefully this explanation will help you understand how to get from the specification to those 8 lines. This technique is going to come in especially handy later when we base data structures off of number systems.

Technique 2: Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.

For this next trick, we’ll add an extra operation to the flipper type above: conversion from a natural number. We want to be able to do it in $\mathcal{O}(n)$ time, and we won’t allow ourselves to change the original type definition. Here’s the type we’re aiming for:

fromNat : ∀ {m} (n : ℕ) → (m≥n : m ≥ n) → Flipper m

We pass in a proof that the natural number we’re converting from is indeed in range (it’s marked irrelevant so we don’t pay for it). Here’s a non-answer:

fromNat : ∀ {m} (n : ℕ) → {m≥n : m ≥ n} → Flipper m
fromNat n {m≥n} = n , m≥n

While this looks fine, it’s actually the inverse of what we want. We defined the inductive structure to be indicated by the inequality proof itself. Let’s make the desired output explicit:

toNat : ∀ {n m} → n ≥ m → ℕ
toNat m≥m = zero
toNat (m≥p n≥m) = suc (toNat n≥m)

fromNat-≡ : ∀ {n} m
          → .(n≥m : n ≥ m)
          →  Σ[ n-m ∈ Flipper n ] toNat (proj₂ n-m) ≡ m

And finally we can try an implementation:

fromNat-≡ zero    _   = (_ , m≥m) , refl
fromNat-≡ (suc m) n≥m = ??? (fromNat-≡ m (m≥p n≥m))

In the ??? there, we want some kind of successor function. The problem is that we would also need to prove that we can do a successor call. Except we don’t want to do that: proving that there’s space left is an expensive operation, and one we can avoid with another trick: first, we assume that there’s space left.

fromNat-≡ zero    n≥m = ( _ , m≥m) , refl
fromNat-≡ (suc n) n≥m with fromNat-≡ n (m≥p n≥m)
... | (suc space , n-1), x≡m  = (space , m≥p n-1), cong suc x≡m
... | (zero      , n-1), refl = ???

But what about the second case? Well, we have to prove this impossible. What if it’s an extremely complex, expensive proof? It doesn’t matter! It will never be run! In contrast to proving the “happy path” correct, if we can confine all of the ugly complex cases to the unhappy paths, we can spend as long as we want proving them impossible without having to worry about runtime cost. Here’s the full function.

fromNat implementation

A Very Simple Prime Sieve in Haskell

Donnacha Oisín Kidney — Sat, 10 Nov 2018 00:00:00 UT

Posted on November 10, 2018

Part 1 of a 2-part series on Prime Sieves

Tags: Haskell

A few days ago, the Computerphile YouTube channel put up a video about infinite lists in Haskell (Haran 2018). It’s pretty basic, but finishes up with a definition of an infinite list of prime numbers. The definition was something like this:

primes = sieve [2..]

sieve (p:ps) = p : sieve [ x | x <- ps, mod x p /= 0 ]

This really demonstrates the elegance of list comprehensions coupled with lazy evaluation. If we’re being totally pedantic, however, this isn’t a genuine sieve of Eratosthenes. And this makes sense: the “true” sieve of Eratosthenes (O’Neill 2009) is probably too complex to demonstrate in a video meant to be an introduction to Haskell. This isn’t because Haskell is bad at this particular problem, mind you: it’s because a lazy, infinite sieve is something very hard to implement indeed.

Anyway, I’m going to try today to show a very simple prime sieve that (hopefully) rivals the simplicity of the definition above.

A First Attempt

Visualizations of the sieve of Eratosthenes often rely on metaphors of “crossing out” on some large table. Once you hit a prime, you cross off all of its multiples in the rest of the table, and then you move to the next crossed-off number.

Working with a finite array, it should be easy to see that this is extremely efficient. You’re crossing off every non-prime exactly once, only using addition and squaring.

To extend it to infinite lists, we will use the following function:

[] \\ ys = []
xs \\ [] = xs
(x:xs) \\ (y:ys) = case compare x y of
    LT -> x : xs \\ (y:ys)
    EQ -> xs \\ ys
    GT -> (x:xs) \\ ys

We’re “subtracting” the right list from the left. Crucially, it works with infinite lists:

>>> take 10 ([1..] \\ [2,4..])
[1,3,5,7,9,11,13,15,17,19]

Finally, it only works if both lists are ordered and don’t contain duplicates, but our sieve does indeed satisfy that requirement. Using this, we’ve already got a sieve:

sieve (p:ps) = p : sieve (ps \\ [p*p, p*p+p..])
primes = 2 : sieve [3,5..]

No division, just addition and squaring, as promised. Unfortunately, though, this doesn’t have the time complexity we want. See, in the (\\) operation, we have to test every entry in the sieve against the prime factor: when we’re crossing off from an array, we just jump to the next composite number.

Using a Queue

The way we speed up the “crossing-off” section of the algorithms is by using a priority queue: this was the optimization provided in O’Neill (2009). Before we go any further, then, let’s put one together:

infixr 5 :-
data Queue a b = Queue
    { minKey :: !a
    , minVal :: b
    , rest   :: List a b
    }

data List a b
    = Nil
    | (:-) {-# UNPACK #-} !(Queue a b)
           (List a b)


(<+>) :: Ord a => Queue a b -> Queue a b -> Queue a b
(<+>) q1@(Queue x1 y1 ts1) q2@(Queue x2 y2 ts2)
  | x1 <= x2 = Queue x1 y1 (q2 :- ts1)
  | otherwise = Queue x2 y2 (q1 :- ts2)

mergeQs :: Ord a => List a b -> Queue a b
mergeQs (t :- ts) = mergeQs1 t ts
mergeQs Nil       = errorWithoutStackTrace "tried to merge empty list"

mergeQs1 :: Ord a => Queue a b -> List a b -> Queue a b
mergeQs1 t1 Nil              = t1
mergeQs1 t1 (t2 :- Nil)      = t1 <+> t2
mergeQs1 t1 (t2 :- t3 :- ts) = (t1 <+> t2) <+> mergeQs1 t3 ts

insert :: Ord a => a -> b -> Queue a b -> Queue a b
insert !k !v = (<+>) (singleton k v)

singleton :: a -> b -> Queue a b
singleton !k !v = Queue k v Nil

These are pairing heaps: I’m using them here because they’re relatively simple and very fast. A lot of their speed comes from the fact that the top-level constructor (Queue) is non-empty. Since, in this algorithm, we’re only actually going to be working with non-empty queues, this saves us a pattern match on pretty much every function. They’re also what’s used in Data.Sequence for sorting.

With that, we can write our proper sieve:

insertPrime x xs = insert (x*x) (map (*x) xs)

adjust x q@(Queue y (z:zs) qs)
  | y <= x = adjust x (insert z zs (mergeQs qs))
  | otherwise = q

sieve (x:xs) = x : sieve' xs (singleton (x*x) (map (*x) xs))
  where
    sieve' (x:xs) table
      | minKey table <= x = sieve' xs (adjust x table)
      | otherwise = x : sieve' xs (insertPrime x xs table)
      
primes = 2 : sieve [3,5..]

Simplifying

The priority queue stores lists alongside their keys: what you might notice is that those lists are simply sequences of the type $[x, 2x, 3x, 4x...]$ and so on. Rather than storing the whole list, we can instead store just the head and the step. This also simplifies (and greatly speeds up) the expensive map (*x) operation to just two multiplications. If you wanted, you could just sub in this representation of streams for all the lists above:

data Stepper a = Stepper { start :: a, step :: a }

nextStep :: Num a => Stepper a -> (a, Stepper a)
nextStep (Stepper x y) = (x, Stepper (x+y) y)

pattern x :- xs <- (nextStep -> (x,xs))

(^*) :: Num a => Stepper a -> a -> Stepper a
Stepper x y ^* f = Stepper (x * f) (y * f)

If you were so inclined, you could even make it conform to Foldable:

data Stepper a where
    Stepper :: Num a => a -> a -> Stepper a

nextStep (Stepper x y) = (x, Stepper (x+y) y)

pattern x :- xs <- (nextStep -> (x,xs))

instance Foldable Stepper where
    foldr f b (x :- xs) = f x (foldr f b xs)

But that’s overkill for what we need here.

Second observation is that if we remove the wheel (from 2), the “start” is simply the key in the priority queue, again cutting down on space.

Finally, we get the implementation:

primes = 2 : sieve 3 (singleton 4 2)
  where
    adjust !x q@(Queue y z qs)
        | x < y = q
        | otherwise = adjust x (mergeQs1 (singleton (y + z) z) qs)
    sieve !x q
        | x < minKey q = x : sieve (x + 1) (insert (x * x) x q)
        | otherwise = sieve (x + 1) (adjust x q)

8 lines for a lazy prime sieve isn’t bad!

I haven’t tried a huge amount to optimize the function, but it might be worth looking in to how to add back the wheels. I noticed that for no wheels, the queue contains only two elements per key; for one (the 2 wheel), we needed 3. I wonder if this pattern continues: possibly we could represent wheels as finite lists at each key in the queue. Maybe in a later post.

Haran, Brady. 2018. “To Infinity & Beyond - Computerphile.” https://www.youtube.com/watch?v=bnRNiE_OVWA&feature=youtu.be.

O’Neill, Melissa E. 2009. “The Genuine Sieve of Eratosthenes.” Journal of Functional Programming 19 (01) (January): 95. doi:10.1017/S0956796808007004.

Total Combinations

Donnacha Oisín Kidney — Tue, 16 Oct 2018 00:00:00 UT

Posted on October 16, 2018

Part 1 of a 1-part series on Total Combinatorics

Tags: Agda, Haskell

Here’s a quick puzzle: from a finite alphabet, produce an infinite list of infinite strings, each of them unique.

It’s not a super hard problem, but here are some examples of what you might get. Given the alphabet of 0 and 1, for instance, you could produce the following:

0000000...
1000000...
0100000...
1100000...
0010000...
1010000...
0110000...
1110000...
0001000...

In other words, the enumeration of the binary numbers (least-significant-digit first). We’ll just deal with bits first:

data Bit = O | I

instance Show Bit where
    showsPrec _ O = (:) '0'
    showsPrec _ I = (:) '1'
    showList xs s = foldr f s xs
      where
        f O a = '0' : a
        f I a = '1' : a

Thinking recursively, we can see that the tail of each list is actually the original sequence, doubled-up:

0000000... 1000000... 0100000... 1100000... 0010000... 1010000... 0110000... 1110000... 0001000...

As it happens, we get something like this pattern with the monad instance for lists anyway:

>>> (,) <$> [O,I] <*> "abc"
[(0,'a'),(0,'b'),(0,'c'),(1,'a'),(1,'b'),(1,'c')]

Well, actually it’s the wrong way around. We want to loop through the first list the quickest, incrementing the second slower. No worries, we can just use a flipped version of <*>:

infixl 4 <<>
(<<>) :: Applicative f => f (a -> b) -> f a -> f b
fs <<> xs = flip ($) <$> xs <*> fs

>>> (,) <$> [O,I] <<> "abc"
[(0,'a'),(1,'a'),(0,'b'),(1,'b'),(0,'c'),(1,'c')]

Brilliant! So we can write our function now, yes?

bins = (:) <$> [O,I] <<> bins

Nope! That won’t ever produce an answer, unfortunately.

Productivity

The issue with our definition above is that it’s not lazy enough: it demands information that it hasn’t produced yet, so it gets caught in an infinite loop before it can do anything!

We need to kick-start it a little, so it can produce output before it asks itself for more. Because we know what the first line is going to be, we can just tell it that:

bins = (:) <$> [O,I] <<> (repeat O : tail bins)

>>> mapM_ print (take 8 (map (take 3) bins))
000
100
010
110
001
101
011
111

The property that this function has that the previous didn’t is productivity: the dual of termination. See, we want to avoid a kind of infinite loops in bins, but we don’t want to avoid infinite things altogether: the list it produces is meant to be infinite, for goodness’ sake. Instead, what it needs to do is produce every new value in finite time.

Checking Productivity

In total languages, like Agda, termination checking is a must. To express computation like that above, though, you often also want a productivity checker. Agda can do that, too.

Let’s get started then. First, a stream:

infixr 5 _◂_
record Stream {a} (A : Set a) : Set a where
  coinductive
  constructor _◂_
  field
    head : A
    tail : Stream A
open Stream

In Haskell, there was no need to define a separate stream type: the type of lists contains both finite and infinite lists.

Agda can get a little more specific: here, we’ve used the coinductive keyword, which means we’re free to create infinite Streams. Rather than the usual termination checking (which would kick in when we consume a recursive, inductive type), we now get productivity checking: when creating a Stream, the tail must always be available in finite time. For a finite type, we’d have used the inductive keyword instead; this wouldn’t be much use, though, since there’s no way to create a finite Stream without a nil constructor!¹

One of the interesting things about working with infinite data (when you’re forced to notice that it’s infinite, as you are in Agda) is that everything gets flipped. So you have to prove productivity, not totality; you use product types, rather than sums; and to define functions, you use copatterns, rather than patterns.

Copatterns

Copatterns are a handy syntactic construct for writing functions about record types. Let’s start with an example, and then I’ll try explain a little:

pure : ∀ {a} {A : Set a} → A → Stream A
head (pure x) = x
tail (pure x) = pure x

Here, we’re defining pure on streams: pure x produces an infinite stream of x. Its equivalent would be repeat in Haskell:

repeat :: a -> [a]
repeat x = x : repeat x

Except instead of describing what it is, you describe how it acts (it’s kind of an intensional vs. extensional thing). In other words, if you want to make a stream xs, you have to answer the questions “what’s the head of xs?” and “what’s the tail of xs?”

Contrast this with pattern-matching: we’re producing (rather than consuming) a value, and in pattern matching, you have to answer a question for each case. If you want to consume a list xs, you have to answer the questions “what do you do when it’s nil?” and “what do you do when it’s cons?”

Anyway, I think the symmetry is kind of cool. Let’s get back to writing our functions.

Sized Types

Unfortunately, we don’t have enough to prove productivity yet. As an explanation why, let’s first try produce the famous fibs list. Written here in Haskell:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

Instead of zipWith, let’s define <*>. That will let us use idiom brackets.

_<*>_ : ∀ {a b} {A : Set a} {B : Set b}
      → Stream (A → B)
      → Stream A
      → Stream B
head (fs <*> xs) = head fs (head xs)
tail (fs <*> xs) = tail fs <*> tail xs

And here’s fibs:

fibs : Stream ℕ
head fibs = 0
head (tail fibs) = 1
tail (tail fibs) = ⦇ fibs + tail fibs ⦈

But it doesn’t pass the productivity checker! Because we use a higher-order function (<*>), Agda won’t look at how much it dips into the infinite supply of values. This is a problem: we need it to know that <*> only needs the heads of its arguments to produce a head, and so on. The solution? Encode this information in the types.

infixr 5 _◂_
record Stream {i : Size} {a} (A : Set a) : Set a where
  coinductive
  constructor _◂_
  field
    head : A
    tail : ∀ {j : Size< i} → Stream {j} A
open Stream

Now, Stream has an implicit size parameter. Basically, Stream {i} A can produce i more values. So cons, then, gives a stream one extra value to produce:

cons : ∀ {i a} {A : Set a} → A → Stream {i} A → Stream {↑ i} A
head (cons x xs) = x
tail (cons x xs) = xs

Conversely, we can write a different definition of tail that consumes one value²:

tail′ : ∀ {i a} {A : Set a} → Stream {↑ i} A → Stream {i} A
tail′ {i} xs = tail xs {i}

For <*>, we want to show that its result can produce just as much values as its inputs can:

_<*>_ : ∀ {i a b} {A : Set a} {B : Set b}
      → Stream {i} (A → B)
      → Stream {i} A
      → Stream {i} B
head (fs <*> xs) = head fs (head xs)
tail (fs <*> xs) = tail fs <*> tail xs

How does this help the termination/productivity checker? Well, for terminating functions, we have to keep giving the tail field smaller and smaller sizes, meaning that we’ll eventually hit zero (and terminate). For productivity, we now have a way to talk about “definedness” in types, so we can make sure that a recursive call doesn’t dip into a supply it hasn’t produced yet.

One more thing: Size types have strange typing rules, mainly for ergonomic purposes (this is why we’re not just using an ℕ parameter). One of them is that if you don’t specify the size, it’s defaulted to ∞, so functions written without size annotations don’t have to be changed with this new definition:

pure : ∀ {a} {A : Set a} → A → Stream A
head (pure x) = x
tail (pure x) = pure x

Finally fibs:

fibs : ∀ {i} → Stream {i} ℕ
head fibs = 0
head (tail fibs) = 1
tail (tail fibs) = ⦇ fibs + tail fibs ⦈

Bugs!

Before I show the Agda solution, I’d like to point out some bugs that were revealed in the Haskell version by trying to implement it totally. First of all, the function signature. “Takes an alphabet and produces unique strings” seems like this:

strings :: [a] -> [[a]]

But what should you produce in this case:

strings []

So it must be a non-empty list, giving us the following type and definition:

strings :: NonEmpty a -> [[a]]
strings (x :| xs) = (:) <$> (x:xs) <<> (repeat x : tail (strings (x :| xs)))

But this has a bug too! What happens if we pass in the following:

strings (x :| [])

So this fails the specification: there is only one unique infinite string from that alphabet (pure x). Interestingly, though, our implementation above also won’t produce any output beyond the first element. I suppose, in a way, these things cancel each other out: our function does indeed produce all of the unique strings, it’s just a pity that it goes into an infinite loop to do so!

Bringing it all Together

Finally, we have our function:

strings : ∀ {i a} {A : Set a} → A × A × List A → Stream {i} (Stream A)
head (strings (x , _ , _)) = pure x
tail (strings {A = A} xs@(x₁ , x₂ , xt)) = go x₂ xt (strings xs)
  where
  go : ∀ {i} → A → List A → Stream {i} (Stream A) → Stream {i} (Stream A)
  head (head (go y ys zs)) = y
  tail (head (go y ys zs)) = head zs
  tail (go _ [] zs) = go x₁ (x₂ ∷ xt) (tail zs)
  tail (go _ (y ∷ ys) zs) = go y ys zs

As you can see, we do need to kick-start it without a recursive call (the first line is pure x). Then, go takes as a third argument the “tails” argument, and does the kind of backwards Cartesian product we want. However, since we’re into the second element of the stream now, we want to avoid repeating what we already said, which is why we have to give go x₂, rather than x₁. This is what forces us to take at least two elements, rather than at least one, also: we can’t just take the tail of the call to go (this is what we did in the Haskell version of strings with the NonEmpty list), as the recursive call to strings then doesn’t decrease in size:

strings : ∀ {i a} {A : Set a} → A × List A → Stream {i} (Stream A)
head (strings (x , _)) = pure x
tail (strings {A = A} xs@(x , xt)) = tail (go x xt (strings xs))
  where
  go : ∀ {i} → A → List A → Stream {i} (Stream A) → Stream {i} (Stream A)
  head (head (go y ys zs)) = y
  tail (head (go y ys zs)) = head zs
  tail (go _ [] zs) = go x xt (tail zs)
  tail (go _ (y ∷ ys) zs) = go y ys zs

Agda will warn about termination on this function. Now, if you slap a pragma on it, it will produce the correct results for enough arguments, but give it one and you’ll get an infinite loop, just as you were warned!

Further Work

I’m having a lot of fun with copatterns for various algorithms (especially combinatorics). I’m planning on working on two particular tasks with them for the next posts in this series:

Proving strings: I’d like to prove that strings does indeed produce a stream of unique values. Following from that, it would be cool to do a Cantor diagonalisation on its output.
Permutations: Haskell’s permutations implementation in Data.List does some interesting tricks to make it as lazy as possible. It would be great to write an implementation that is verified to be as lazy as possible: the pattern of “definedness” is complex, though, so I don’t know if it’s possible with Agda’s current sized types.

Thanks to gelisam for pointing out the poor phrasing here. Updated on 2018/10/16↩︎
You might wonder why the definition of tail doesn’t have this signature to begin with. The reason is that our record type must be parameterized (not indexed) over its size (as it’s a record type), so we use a less-than proof instead.↩︎

Agda Beginner(-ish) Tips, Tricks, and Pitfalls

Donnacha Oisín Kidney — Thu, 20 Sep 2018 00:00:00 UT

Posted on September 20, 2018

Part 1 of a 2-part series on Agda Tips

Tags: Agda

I’m in the middle of quite a large Agda project at the moment, and I’ve picked up a few tips and tricks in the past few weeks. I’d imagine a lot of these are quite obvious once you get to grips with Agda, so I’m writing them down before I forget that they were once confusing stumbling blocks. Hopefully this helps other people trying to learn the language!

Parameterized Modules Strangeness

Agda lets you parameterize modules, just as you can datatypes, with types, values, etc. It’s extremely handy for those situations where you want to be generic over some type, but that type won’t change inside the generic code. The keys to dictionaries is a good example: you can start the module with:

module Map (Key : Set) (Ordering : Ord Key) where

And now, where in Haskell you’d have to write something like Ord a => Map a… in pretty much any function signature, you can just refer to Key, and you’re good to go. It’s kind of like a dynamic type synonym, in that way.

Here’s the strangeness, though: what if you don’t supply one of the arguments?

import Map

This won’t give you a type error, strange as it may seem. This will perform lambda lifting, meaning that now, every function exported by the module will have the type signature:

(Key : Set) (Ordering : Ord Key) ...

Preceding its normal signature. In other words, it changes it into what you would have had to write in Haskell.

This is a powerful feature, but it can also give you some confusing errors if you don’t know about it (especially if the module has implicit arguments).

Auto

If you’ve got a hole in your program, you can put the cursor in it and press SPC-m-a (in spacemacs), and Agda will try and find the automatic solution to the problem. For a while, I didn’t think much of this feature, as rare was the program which Agda could figure out. Turns out I was just using it wrong! Into the hole you should type the options for the proof search: enabling case-splitting (-c), enabling the use of available definitions (-r), and listing possible solutions (-l).

Well-Founded Recursion

Often, a program will not be obviously terminating (according to Agda’s termination checker). The first piece of advice is this: don’t use well-founded recursion. It’s a huge hammer, and often you can get away with fiddling with the function (try inlining definitions, rewriting generic functions to monomorphic versions, or replacing with-blocks with helper functions), or using one of the more lightweight techniques out there.

However, sometimes it really is the best option, so you have to grit your teeth and use it. What I expected (and what I used originally) was a recursion combinator, with a type something like:

wf-rec : ∀ {a b} {A : Set a} {B : Set b}
       → ((x : A) → ((y : A) → y < x → B) → B)
       → A → B

So we’re trying to generate a function of type A → B, but there’s a hairy recursive call in there somewhere. Instead we use this function, and pass it a version of our function that uses the supplied function rather than making a recursive call:

terminating : A → B
terminating = wf-rec (λ x recursive-call → ...)

In other words, instead of calling the function itself, you call recursive-call above. Along with the argument, you supply a proof that it’s smaller than the outer argument (y < x; assume for now that the definition of < is just some relation like _<_ in Data.Nat).

But wait! You don’t have to use it! Instead of all that, you can just pass the Acc _<_ x type as a parameter to your function. In other words, if you have a dangerous function:

f : A → B

Instead write:

f-step : (x : A) → Acc _<_ x → B
f-step = ...

f : A → B
f x = f-step x ...

Once you pattern match on the accessibility relation, the termination checker is satisfied. This is much easier to understand (for me anyway), and made it much easier to write proofs about it.

Thanks to Oleg Grenrus (phadej) on irc for helping me out with this! Funnily enough, he actually recommended the Acc approach, and I instead originally went with the recursion combinator. Would have saved a couple hours if I’d just listened! Also worth mentioning is the approach recommended by Guillaume Allais (gallais), detailed here. Haven’t had time to figure it out, so this article may be updated to recommend it instead in the future.

Don’t Touch The Green Slime!

This one is really important. If I hadn’t read the exact explanation here I think I may have given up with Agda (or at the very least the project I’m working on) out of frustration.

Basically the problem arises like this. Say you’re writing a function to split a vector in two. You can specify the type pretty precisely:

split : ∀ {a n m} {A : Set a} → Vec A (n + m) → Vec A n × Vec A m
split xs = {!!}

Try to pattern-match on xs, though, and you’ll get the following error:

I'm not sure if there should be a case for the constructor [],
because I get stuck when trying to solve the following unification
problems (inferred index ≟ expected index):
  zero ≟ n + m
when checking that the expression ? has type Vec .A .n × Vec .A .m

What?! That’s weird. Anyway, you fiddle around with the function, end up pattern matching on the n instead, and continue on with your life.

What about this, though: you want to write a type for proofs that one number is less than or equal to another. You go with something like this:

infix 4 _≤_
data _≤_ (n : ℕ) : ℕ → Set where
  proof : ∀ k → n ≤ n + k

And you want to use it in a proof. Here’s the example we’ll be using: if two numbers are less than some limit u, then their maximum is also less than that limit:

max : ℕ → ℕ → ℕ
max zero m = m
max (suc n) zero = suc n
max (suc n) (suc m) = suc (max n m)

max-≤ : ∀ n m {u} → n ≤ u → m ≤ u → max n m ≤ u
max-≤ n m (proof k) m≤u = {!!}

It won’t let you match on m≤u! Here’s the error:

I'm not sure if there should be a case for the constructor proof,
because I get stuck when trying to solve the following unification
problems (inferred index ≟ expected index):
  m₁ + k₂ ≟ n₁ + k₁
when checking that the expression ? has type max n m ≤ n + k

What do you mean you’re not sure if there’s a case for the constructor proof: it’s the only case!

The problem is that Agda is trying to unify two types who both have calls to user-defined functions in them, which is a hard problem. As phrased by Conor McBride:

When combining prescriptive and descriptive indices, ensure both are in constructor form. Exclude defined functions which yield difficult unification problems.

So if you ever get the “I’m not sure if…” error, try either to:

Redefine the indices so they use constructors, not functions.
Remove the index, instead having a proof inside the type of equality. What does that mean? Basically, transform the definition of ≤ above into the one in Data.Nat.

Inspect

The use-case I had for this is a little long, I’m afraid (too long to include here), but it did come in handy. Basically, if you’re trying to prove something about a function, you may well want to run that function and pattern match on the result.

f-is-the-same-as-g : ∀ x → f x ≡ g x
f-is-the-same-as-g x with f x
f-is-the-same-as-g x | y = {!!}

This is a little different from the normal way of doing things, where you’d pattern match on the argument. It is a pattern you’ll sometimes need to write, though. And here’s the issue: that y has nothing to do with f x, as far as Agda is concerned. All you’ve done is introduced a new variable, and that’s that.

This is exactly the problem inspect solves: it runs your function, giving you a result, but also giving you a proof that the result is equal to running the function. You use it like this:

f-is-the-same-as-g : ∀ x → f x ≡ g x
f-is-the-same-as-g x with f x | inspect f x
f-is-the-same-as-g x | y | [ fx≡y ] = {!!}

SPC-G-G

Because the Agda standard library is a big fan of type synonyms (Op₂ A instead of A → A → A for example), it’s handy to know that pressing SPC-G-G (in spacemacs) over any identifier will bring you to the definition. Also, you can normalize a type with SPC-m-n.

Irrelevance

This one is a little confusing, because Agda’s notion of “irrelevance” is different from Idris’, or Haskell’s. In all three languages, irrelevance is used for performance: it means that a value doesn’t need to be around at runtime, so the compiler can elide it.

That’s where the similarities stop though. In Haskell, all types are irrelevant: they’re figments of the typechecker’s imagination. You can’t get a type at runtime full stop.

In dependently typed languages, this isn’t a distinction we can rely on. The line between runtime entities and compile-time entities is drawn elsewhere, so quite often types need to exist at runtime. As you might guess, though, they don’t always need to. The length of a length-indexed vector, for instance, is completely determined by the structure of the vector: why would you bother storing all of that information at runtime? This is what Idris recognizes, and what it tries to remedy: it analyses code for these kinds of opportunities for elision, and does so when it can. Kind of like Haskell’s fusion, though, it’s an invisible optimization, and there’s no way to make Idris throw a type error when it can’t elide something you want it to elide.

Agda is totally different. Something is irrelevant in Agda if it’s unique. Or, rather, it’s irrelevant if all you rely on is its existence. It’s used for proofs that you carry around with you: in a rational number type, you might use it to say that the numerator and denominator have no common factors. The only information you want from this proof is whether it holds or not, so it’s the perfect candidate for irrelevance.

Weirdly, this means it’s useless for the length-indexed vector kind of stuff mentioned above. In fact, it doe exactly the opposite of what you might expect: if the length parameter is marked as irrelevant, the the types Vec A n and Vec A (suc n) are the same!

The way you can use it is to pattern-match if it’s impossible. Again, it’s designed for eliding proofs that you may carry with you otherwise.

Future Tips

Once I’m finished the project, I’ll try write up a guide on how to do literate Agda files. There were a couple of weird nuances that I had to pick up on the way, mainly to do with getting unicode to work.

Verified AVL Trees in Haskell and Agda

Donnacha Oisín Kidney — Mon, 30 Jul 2018 00:00:00 UT

Posted on July 30, 2018

Tags: Haskell, Agda

I’ve been writing a lot of Agda recently, and had the occasion to write a Fenwick tree that did some rebalancing. I went with AVL-style rebalancing (rather than red-black or trees of bounded balance). I’d written pretty full implementations of the other two before, and the Agda standard library (Danielsson 2018) has an implementation already that I was able to use as a starting point. Also, apparently, AVL trees seem to perform better than red-black trees in practice (Pfaff 2004).

This post will be similar in style to Stephanie Weirich’s talk (2014), which compares an Agda implementation of verified red-black trees to a Haskell one. When there’s two columns of code side-by-side, the left-hand side is Haskell, the right Agda.

The method of constructing the ordering proof is taken from “How to Keep Your Neighbours in Order” (2014) by Conor McBride; the structural proofs are somewhat inspired by the implementation in the Agda standard library, but are mainly my own.

Height

AVL trees are more strictly balanced than red-black trees: the height of neighboring subtrees can differ by at most one. To store the height, we will start as every dependently-typed program does: with Peano numbers.

Haskell

data N = Z | S N

Agda

data ℕ : Set where
  zero : ℕ
  suc  : ℕ → ℕ

The trees will be balanced one of three possible ways: left-heavy, right-heavy, or even. We can represent these three cases in a GADT in the case of Haskell, or an indexed datatype in the case of Agda:

data Balance :: N -> N -> N -> Type where
      L :: Balance (S n) n    (S n)
      O :: Balance  n    n     n
      R :: Balance  n   (S n) (S n)

data ⟨_⊔_⟩≡_ : ℕ → ℕ → ℕ → Set where
  ◿  : ∀ {n} → ⟨ suc  n ⊔      n ⟩≡ suc  n
  ▽  : ∀ {n} → ⟨      n ⊔      n ⟩≡      n
  ◺  : ∀ {n} → ⟨      n ⊔ suc  n ⟩≡ suc  n

Those unfamiliar with Agda might be a little intimidated by the mixfix operator in the balance definition: we’re using it here because the type can be seen of a proof that:

$max(x,y) = z$

Or, using the $\sqcup$ operator:

$(x \sqcup y) = z$

We’ll use this proof in the tree itself, as we’ll need to know the maximum of the height of a node’s two subtrees to find the height of the node. Before we do that, we’ll need a couple helper functions for manipulating the balance:

balr :: Balance x y z -> Balance z x z
balr L = O
balr O = O
balr R = L

ball :: Balance x y z -> Balance y z z
ball L = R
ball O = O
ball R = O

⃕ : ∀ {x y z} → ⟨ x ⊔ y ⟩≡ z → ⟨ z ⊔ x ⟩≡ z
⃕  ◿  = ▽
⃕  ▽  = ▽
⃕  ◺  = ◿

⃔ : ∀ {x y z} → ⟨ x ⊔ y ⟩≡ z → ⟨ y ⊔ z ⟩≡ z
⃔  ◿  = ◺
⃔  ▽  = ▽
⃔  ◺  = ▽

Ordering

Along with the verification of the structure of the tree, we will also want to verify that its contents are ordered correctly. Unfortunately, this property is a little out of reach for Haskell, but it’s 100% doable in Agda. First, we’ll need a way to describe orders on a data type. In Haskell, we might write:

class Ord a where
  (==) :: a -> a -> Bool
  (<)  :: a -> a -> Bool

***

That Bool throws away any information gained in the comparison, though: we want to supply a proof with the result of the comparison. First, equality:

infix 4 ==
data (==) :: Type
          -> Type
          -> Type where
  Refl :: x == x

infix 4 _≡_
data _≡_ {a} {A : Set a}
         (x : A) 
         : A → Set a where
  refl : x ≡ x

This is one of the many ways to describe equality in Agda. It’s a type with only one constructor, and it can only be constructed when its two arguments are the same. When we pattern match on the constructor, then, we’re given a proof that whatever things those arguments refer to must be the same.

Next, we need to describe an order. For this, we’ll need two types: the empty type, and the unit type.

data Void :: Type where
data Unit :: Type where Unit :: Unit

data ⊥ : Set where
data ⊤ : Set where ⟨⟩ : ⊤

These are kind of like type-level Bools, with one extra, powerful addition: they keep their proof after construction. Because ⊥ has no constructors, if someone tells you they’re going to give you one, you can be pretty sure they’re lying. How do we use this? Well, first, on the numbers:

type family (n :: N) < (m :: N)
    :: Type where
  x   < Z   = Void
  Z   < S y = Unit
  S x < S y = x < y

_ℕ<_ : ℕ → ℕ → Set
x     ℕ< zero  = ⊥
zero  ℕ< suc y = ⊤
suc x ℕ< suc y = x ℕ< y

Therefore, if we ask for something of type x ℕ< y (for some x and y), we know that it only exists when x really is less than y (according to the definition above).

For our actual code, we’ll parameterize the whole thing over some abstract key type. We’ll do this using a module (a feature recently added to Haskell, as it happens). That might look something like this:

signature Key where
  import Data.Kind
  data Key
  type family (n :: Key) < (m :: Key)
    :: Type

module AVL where
  import Key

module AVL
  {k r} (Key : Set k)
  {_<_ : Rel Key r}
  (isStrictTotalOrder
   : IsStrictTotalOrder _≡_ _<_)
  where

  open IsStrictTotalOrder isStrictTotalOrder

(the k and r here, as well as the Lifting noise below, are to do with Agda’s universe system, which I’ll try explain in a bit)

Now, the trick for the ordering is to keep a proof that two neighboring values are ordered correctly in the tree at each leaf (as there’s a leaf between every pair of nodes, this is exactly the place you should store such a proof). A problem arises with the extremal leaves in the tree (leftmost and rightmost): each leaf is missing one neighboring value, so how can it store a proof of order? The solution is to affix two elements to our key type which we define as the greatest and least elements of the set.



data Bound = LB | IB Key | UB



infix 4 <:

type family (x :: Bound) <: (y :: Bound)
    :: Type where
  LB   <: LB   = Void
  LB   <: UB   = Unit
  LB   <: IB _ = Unit
  UB   <: _    = Void
  IB _ <: LB   = Void
  IB _ <: UB   = Unit
  IB x <: IB y = x < y

infix 5 [_]

data [∙] : Set k where
  ⌊⌋ ⌈⌉ : [∙]
  [_]   : (k : Key) → [∙]

infix 4 _[<]_

_[<]_ : [∙] → [∙] → Set r
⌊⌋     [<] ⌊⌋    = Lift r ⊥
⌊⌋     [<] ⌈⌉    = Lift r ⊤
⌊⌋     [<] [ _ ] = Lift r ⊤
⌈⌉     [<] _     = Lift r ⊥
[ _ ]  [<] ⌊⌋    = Lift r ⊥
[ _ ]  [<] ⌈⌉    = Lift r ⊤
[ x ]  [<] [ y ] = x < y

The Tree Type

After all that, we can get bring back Haskell into the story, and define or tree types:


data Tree :: N
          -> Type
          -> Type
          -> Type where
  Leaf :: Tree Z k v
  Node :: k
       -> v
       -> Balance lh rh h
       -> Tree lh k v
       -> Tree rh k v
       -> Tree (S h) k v

data Tree {v} 
          (V : Key → Set v)
          (l u : [∙]) : ℕ →
          Set (k ⊔ v ⊔ r) where
  leaf  : (l: l [<] u) → Tree V l u 0
  node  : ∀  {h lh rh}
             (k : Key)
             (v : V k)
             (bl : ⟨ lh ⊔ rh ⟩≡ h)
             (lk : Tree V l [ k ] lh)
             (ku : Tree V [ k ] u rh) →
             Tree V l u (suc h)

The two definitions are similar, but have a few obvious differences. The Agda version stores the ordering proof at the leaves, as well as the bounds as indices. Its universe is also different: briefly, universes are one of the ways to avoid Russell’s paradox when you’re dealing with dependent types.

In normal, standard Haskell, we think of types as things that describe values (how quaint!). When you’ve got a list, everything in the list has the same type, and that is good and right.

These days, though, we’re not so constrained:

infixr 5 :-
data List xs where
  Nil  :: List '[]
  (:-) :: x
       -> List xs
       -> List (x : xs)

infixr 5 _፦_
data List′ : List Set → Set where
  nil : List′ []
  _፦_ : ∀ {x xs}
      → x 
      → List′ xs 
      → List′ (x ∷ xs)

This can quite happily store elements of different types:

example :: List [Bool, String, Integer]
example = True :- "true" :- 1 :- Nil

example : List′ (Bool ∷ String ∷ ℕ ∷ [])
example = true ፦ "true" ፦ 1 ፦ nil

And look at that bizarre-looking list on the wrong side of “::”! Types aren’t just describing values, they’re acting like values themselves. What type does [Bool, String, Integer] even have, anyway? Why, [Type] of course!

So we see that types can be put in lists, and types have types: the natural question then is:

Type :: ???

Set : ???

And this is where Haskell and Agda diverge: in Haskell, we say Type :: Type (as the old extension TypeInType implied), and that’s that. From a certain point of view, we’ve opened the door to Russell’s paradox (we’ve allowed a set to be a member of itself). This isn’t an issue in Haskell, though, as the type-level language was already inconsistent.

Agda goes another way, saying that Set (Agda’s equivalent for Type) has the type Set₁, and Set₁ has the type Set₂, and so on¹. These different sets are called “universes” and their numbers “levels”. When we write k ⊔ v ⊔ r, we’re saying we want to take the greatest universe level from those three possible levels: the level of the key, the value, and the relation, respectively.

Type :: Type

Set : Set₁

Rotations

AVL trees maintain their invariants through relatively simple rotations. We’ll start with the right rotation, which fixes an imbalance of two on the left. Because the size of the tree returned might change, we’ll need to wrap it in a datatype:

data (++?) :: (N -> Type)
           -> (N -> Type)
           where
  Stay :: t n     -> t ++? n
  Incr :: t (S n) -> t ++? n


_1?+⟨_⟩ : ∀ {𝓁} (T : ℕ → Set 𝓁) → ℕ → Set 𝓁
T 1?+⟨ n ⟩ = ∃[ inc? ] T (if inc?
                            then suc n
                            else n)

pattern 0+_ tr = false , tr
pattern 1+_ tr = true  , tr

We could actually have the Agda definition be the same as Haskell’s, it doesn’t make much difference. I’m mainly using it here to demonstrate dependent pairs in Agda. The first member of the pair is just a boolean (increased in height/not increased in height). The second member is a tree whose height depends on the actual value of the boolean. The ∃ business is just a fancy syntax; it also waggles its eyebrows at the way a (dependent) pair of type (x , y) means “There exists an x such that y”.

Using this, we can write the type for right-rotation:

rotr :: k
     -> v
     -> Tree (S (S rh)) k v
     -> Tree rh k v
     -> Tree k v ++? S (S rh)

rotʳ : ∀ {lb ub rh v} {V : Key → Set v}
     → (k : Key)
     → V k
     → Tree V lb [ k ] (suc (suc rh))
     → Tree V [ k ] ub rh
     → Tree V lb ub 1?+⟨ suc (suc rh) ⟩

There are two possible cases, single rotation:

   ┌a       ┌a
 ┌y┤       y┤
 │ └b --->  │ ┌b
x┤          └x┤
 └c           └c

rotr x xv (Node y yv L a b) c =
  Stay (Node y yv O a (Node x xv O b c))
rotr x xv (Node y yv O a b) c =
  Incr (Node y yv R a (Node x xv L b c))

rotʳ x xv (node y yv ◿ a b) c =
  0+ (node y yv ▽ a (node x xv ▽  b c))
rotʳ x xv (node y yv ▽ a b) c =
  1+ (node y yv ◺ a (node x xv ◿  b c))

And double:

   ┌a           ┌a
 ┌y┤          ┌y┤
 │ │ ┌b       │ └b
 │ └z┤  ---> z┤
 │   └c       │ ┌c
x┤            └x┤
 └d             └d

rotr x xv (Node y yv R a 
            (Node z zv bl b c)) d =
  Stay (Node z zv O 
         (Node y yv (balr bl) a b)
         (Node x xv (ball bl) c d))

rotʳ x xv (node y yv ◺  a
            (node z zv bl b c)) d =
  0+ (node z zv ▽
       (node y yv (⃕ bl) a b)
       (node x xv (⃔ bl) c d))

I won’t bore you with left-rotation: suffice to say, it’s the opposite of right-rotation.

Insertion

Finally, the main event: insertion. Once the above functions have all been defined, it’s not very difficult, as it happens: by and large, the types guide you to the right answer. Of course, this is only after we decided to use the pivotal pragmatism and balance approach.

insertWith
    :: Ord k
    => (v -> v -> v)
    -> k
    -> v
    -> Tree h k v
    -> Tree k v ++? h
insertWith _ v vc Leaf =
  Incr (Node v vc O Leaf Leaf)
insertWith f v vc (Node k kc bl tl tr) =
  case compare v k of
    LT ->
      case insertWith f v vc tl of
        Stay tl' ->
          Stay (Node k kc bl tl' tr)
        Incr tl' -> case bl of
          L -> rotr k kc tl' tr
          O -> Incr (Node k kc L tl' tr)
          R -> Stay (Node k kc O tl' tr)
    EQ ->
      Stay (Node v (f vc kc) bl tl tr)
    GT ->
      case insertWith f v vc tr of
        Stay tr' ->
          Stay (Node k kc bl tl tr')
        Incr tr' -> case bl of
          L -> Stay (Node k kc O tl tr')
          O -> Incr (Node k kc R tl tr')
          R -> rotl k kc tl tr'

insert : ∀ {l u h v}
           {V : Key → Set v}
           (k : Key)
       → V k
       → (V k → V k → V k)
       → Tree V l u h
       → l < k < u
       → Tree V l u 1?+⟨ h ⟩
insert v vc f (leaf l) (l , u) =
  1+ (node v vc ▽ (leaf l) (leaf u))
insert v vc f (node k kc bl tl tr) prf
  with compare v k
insert v vc f (node k kc bl tl tr) (l , _)
    | tri< a _ _ with insert v vc f tl (l , a)
... | 0+ tl′ = 0+ (node k kc bl tl′ tr)
... | 1+ tl′ with bl
... | ◿ = rotʳ k kc tl′ tr
... | ▽ = 1+ (node k kc  ◿  tl′ tr)
... | ◺ = 0+ (node k kc  ▽  tl′ tr)
insert v vc f (node k kc bl tl tr) _
    | tri≈ _ refl _ =
        0+ (node k (f vc kc) bl tl tr)
insert v vc f (node k kc bl tl tr) (_ , u)
    | tri> _ _ c with insert v vc f tr (c , u)
... | 0+ tr′ = 0+ (node k kc bl tl tr′)
... | 1+ tr′ with bl
... | ◿ = 0+ (node k kc ▽ tl tr′)
... | ▽ = 1+ (node k kc ◺ tl tr′)
... | ◺ = rotˡ k kc tl tr′

Conclusion

Overall, I’ve been enjoying programming in Agda. The things I liked and didn’t like surprised me:

Editor Support

Is excellent. I use spacemacs, and the whole thing worked pretty seamlessly. Proof search and auto was maybe not as powerful as Idris’, although that might be down to lack of experience (note—as I write this, I see you can enable case-splitting in proof search, so it looks like I was right about my lack of experience). In many ways, it was much better than Haskell’s editor support: personally, I have never managed to get case-splitting to work in my Haskell setup, never mind some of the fancier features that you get in Agda.

It’s worth noting that my experience with Idris is similar: maybe it’s something about dependent types?

Of course, I missed lots of extra tools, like linters, code formatters, etc., but the tight integration with the compiler was so useful it more than made up for it.

Also, I’d implore anyone who’s had trouble with emacs before to give spacemacs a go. It works well out-of-the-box, and has a system for keybinding discovery that actually works.

Documentation

Pretty good, considering. There are some missing parts (rewriting and telescopes are both stubs on the documentation site), but there seemed to be more fully worked-out examples available online for different concepts when I needed to figure them out.

Now, the thing about a lot of these complaints/commendations (especially with regards to tooling and personal setups) is that people tend to be pretty bad about evaluating how difficult finicky tasks like editor setups are. Once you’ve gotten the hang of some of this stuff, you forget that you ever didn’t. Agda is the second dependently-typed language I’ve really gone for a deepish dive on, and I’ve been using spacemacs for a while, so YMMV.

One area of the language itself that I would have liked to see more on was irrelevance. Looking back at the definition of the tree type, in the Haskell version there’s no singleton storing the height (the balance type stores all the information we need), which means that it definitely doesn’t exist at runtime. As I understand it, that implies that the type should be irrelevant in the equivalent Agda. However, when I actually mark it as irrelevant, everything works fine, except that missing cases warnings start showing up. I couldn’t figure out why: Haskell was able to infer full case coverage without the index, after all. Equality proof erasure, also: is it safe? Consistent?

All in all, I’d encourage more Haskellers to give Agda a try. It’s fun, interesting, and $\mathcal{Unicode}$ !

References

Danielsson, Nils Anders. 2018. “The Agda standard library.”

McBride, Conor Thomas. 2014. “How to Keep Your Neighbours in Order.” In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, 297–309. ICFP ’14. New York, NY, USA: ACM. doi:10.1145/2628136.2628163.

Pfaff, Ben. 2004. “Performance Analysis of BSTs in System Software.” In Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 410–411. SIGMETRICS ’04/Performance ’04. New York, NY, USA: ACM. doi:10.1145/1005686.1005742.

Weirich, Stephanie. 2014. “Depending on Types.” In Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, 241–241. ICFP ’14. New York, NY, USA: ACM. doi:10.1145/2628136.2631168.

My phrasing is maybe a little confusing here. When Set “has the type” Set₁ it means that Set is in Set₁, not the other way around.↩︎

Probabilistic Functional Programming

Donnacha Oisín Kidney — Tue, 17 Jul 2018 00:00:00 UT

Posted on July 17, 2018

Tags: Haskell, Probability

Here are the slides for a short talk I gave to a reading group I’m in at Harvard today. The speaker notes are included in the pdf, code and the tex is available in the repository.

Probability 5 Ways

Donnacha Oisín Kidney — Sat, 30 Jun 2018 00:00:00 UT

Posted on June 30, 2018

Tags: Probability, Haskell

Ever since the famous pearl by Erwig and Kollmansberger (2006), probabilistic programming with monads has been an interesting and diverse area in functional programming, with many different approaches.

I’m going to present five here, some of which I have not seen before.

The Classic

As presented in the paper, a simple and elegant formulation of probability distributions looks like this:

newtype Prob a
    = Prob
    { runProb :: [(a, Rational)]
    }

It’s a list of possible events, each tagged with their probability of happening. Here’s the probability distribution representing a die roll, for instance:

die :: Prob Integer
die = [ (x, 1/6) | x <- [1..6] ]

The semantics can afford to be a little fuzzy: it doesn’t hugely matter if the probabilities don’t add up to 1 (you can still extract meaningful answers when they don’t). However, I can’t see a way in which either negative probabilities or an empty list would make sense. It would be nice if those states were unrepresentable.

Its monadic structure multiplies conditional events:

instance Functor Prob where
    fmap f xs = Prob [ (f x, p) | (x,p) <- runProb xs ]
    
instance Applicative Prob where
    pure x = Prob [(x,1)]
    fs <*> xs
        = Prob
        [ (f x,fp*xp)
        | (f,fp) <- runProb fs
        , (x,xp) <- runProb xs ]
                     
instance Monad Prob where
    xs >>= f
        = Prob
        [ (y,xp*yp)
        | (x,xp) <- runProb xs
        , (y,yp) <- runProb (f x) ]

In most of the examples, we’ll need a few extra functions in order for the types to be useful. First is support:

support :: Prob a -> [a]
support = fmap fst . runProb

And second is expectation:

expect :: (a -> Rational) -> Prob a -> Rational
expect p xs = sum [ p x * xp | (x,xp) <- runProb xs ]

probOf :: (a -> Bool) -> Prob a -> Rational
probOf p = expect (bool 0 1 . p)

It’s useful to be able to construct uniform distributions:

uniform xs = Prob [ (x,n) | x <- xs ]
  where
    n = 1 % toEnum (length xs)
    
die = uniform [1..6]

>>> probOf (7==) $ do
  x <- die
  y <- die
  pure (x+y)
1 % 6

The Bells and Whistles

As elegant as the above approach is, it leaves something to be desired when it comes to efficiency. In particular, you’ll see a combinatorial explosion at every step. To demonstrate, let’s take the example above, using three-sided dice instead so it doesn’t take up too much space.

die = uniform [1..3]

example = do
  x <- die
  y <- die
  pure (x+y)

The probability table looks like this:

But the internal representation looks like this:

States are duplicated, because the implementation has no way of knowing that two outcomes are the same. We could collapse equivalent outcomes if we used a Map, but then we can’t implement Functor, Applicative, or Monad. The types:

class Functor f where
    fmap :: (a -> b) -> f a -> f b

class Functor f => Applicative f where
    pure :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b

class Applicative f => Monad f where
    (>>=) :: f a -> (a -> f b) -> f b

Don’t allow an Ord constraint, which is what we’d need to remove duplicates. We can instead make our own classes which do allow constraints:

{-# LANGUAGE RebindableSyntax #-}
{-# LANGUAGE TypeFamilies     #-}

import Prelude hiding (Functor(..),Applicative(..),Monad(..))

import Data.Kind

class Functor f where
    type Domain f a :: Constraint
    type Domain f a = ()
    fmap :: Domain f b => (a -> b) -> f a -> f b

class Functor f => Applicative f where
    {-# MINIMAL pure, liftA2 #-}
    pure   :: Domain f a => a -> f a
    liftA2 :: Domain f c => (a -> b -> c) -> f a -> f b -> f c
    
    (<*>) :: Domain f b => f (a -> b) -> f a -> f b
    (<*>) = liftA2 ($) 

class Applicative f => Monad f where
    (>>=) :: Domain f b => f a -> (a -> f b) -> f b

fail :: String -> a
fail = error

return :: (Applicative f, Domain f a) => a -> f a
return = pure

This setup gets over a couple common annoyances in Haskell, like making Data.Set a Monad:

instance Functor Set where
    type Domain Set a = Ord a
    fmap = Set.map

instance Applicative Set where
    pure = Set.singleton
    liftA2 f xs ys = do
        x <- xs
        y <- ys
        pure (f x y)

instance Monad Set where
    (>>=) = flip foldMap

And, of course, the probability monad:

newtype Prob a = Prob
    { runProb :: Map a Rational
    }

instance Functor Prob where
    type Domain Prob a = Ord a
    fmap f = Prob . Map.mapKeysWith (+) f . runProb

instance Applicative Prob where
    pure x = Prob (Map.singleton x 1)
    liftA2 f xs ys = do
      x <- xs
      y <- ys
      pure (f x y)
      
instance Ord a => Monoid (Prob a) where
    mempty = Prob Map.empty
    mappend (Prob xs) (Prob ys) = Prob (Map.unionWith (+) xs ys)

instance Monad Prob where
    Prob xs >>= f
        = Map.foldMapWithKey ((Prob .) . flip (Map.map . (*)) . runProb . f) xs

support = Map.keys . runProb

expect p = getSum . Map.foldMapWithKey (\k v -> Sum (p k * v)) . runProb

probOf p = expect (bool 0 1 . p)

uniform xs = Prob (Map.fromList [ (x,n) | x <- xs ])
  where
    n = 1 % toEnum (length xs)

ifThenElse True t _ = t
ifThenElse False _ f = f

die = uniform [1..6]

>>> probOf (7==) $ do
  x <- die
  y <- die
  pure (x + y)
1 % 6

Free

Coming up with the right implementation all at once is quite difficult: luckily, there are more general techniques for designing DSLs that break the problem into smaller parts, which also give us some insight into the underlying composition of the probability monad.

The technique relies on an algebraic concept called “free objects”. A free object for some class is a minimal implementation of that class. The classic example is lists: they’re the free monoid. Monoid requires that you have an additive operation, an empty element, and that the additive operation be associative. Lists have all of these things: what makes them free, though, is that they have nothing else. For instance, the additive operation on lists (concatenation) isn’t commutative: if it was, they wouldn’t be the free monoid any more, because they satisfy an extra law that’s not in monoid.

For our case, we can use the free monad: this takes a functor and gives it a monad instance, in a way we know will satisfy all the laws. This encoding is used in several papers (Ścibior, Ghahramani, and Gordon 2015; Larsen 2011).

The idea is to first figure out what primitive operation you need. We’ll use weighted choice:

choose :: Prob a -> Rational -> Prob a -> Prob a
choose = ...

Then you encode it as a functor:

data Choose a
    = Choose Rational a a
    deriving (Functor,Foldable)

We’ll say the left-hand-choice has chance $p$ , and the right-hand $1-p$ . Then, you just wrap it in the free monad:

type Prob = Free Choose

And you already have a monad instance. Support comes from the Foldable instance:

import Data.Foldable

support :: Prob a -> [a]
support = toList

Expectation is an “interpreter” for the DSL:

expect :: (a -> Rational) -> Prob a -> Rational
expect p = iter f . fmap p
  where
    f (Choose c l r) = l * c + r * (1-c)

For building up the tree, we can use Huffman’s algorithm:

fromList :: (a -> Rational) -> [a] -> Prob a
fromList p = go . foldMap (\x -> singleton (p x) (Pure x))
  where
    go xs = case minView xs of
      Nothing -> error "empty list"
      Just ((xp,x),ys) -> case minView ys of
        Nothing -> x
        Just ((yp,y),zs) ->
          go (insertHeap (xp+yp) (Free (Choose (xp/(xp+yp)) x y)) zs)

And finally, it gets the same notation as before:

uniform = fromList (const 1)

die = uniform [1..6]

probOf p = expect (bool 0 1 . p)

>>> probOf (7==) $ do
  x <- die
  y <- die
  pure (x + y)
1 % 6

One of the advantages of the free approach is that it’s easy to define multiple interpreters. We could, for instance, write an interpreter that constructs a diagram:

>>> drawTree ((,) <$> uniform "abc" <*> uniform "de")
           ┌('c','d')
     ┌1 % 2┤
     │     └('c','e')
1 % 3┤
     │           ┌('a','d')
     │     ┌1 % 2┤
     │     │     └('a','e')
     └1 % 2┤
           │     ┌('b','d')
           └1 % 2┤
                 └('b','e')

Final

There’s a lot to be said about free objects in category theory, also. Specifically, they’re related to initial and terminal (also called final) objects. The encoding above is initial, the final encoding is simply Cont:

newtype Cont r a = Cont { runCont :: (a -> r) -> r }

type Prob = Cont Rational

Here, also, we get the monad instance for free. In contrast to previously, expect is free:

expect = flip runCont

Support, though, isn’t possible.

This version is also called the Giry monad: there’s a deep and fascinating theory behind it, which I probably won’t be able to do justice to here. Check out Jared Tobin’s post -Tobin (2017) for a good deep dive on it.

Cofree

The branching structure of the tree captures the semantics of the probability monad well, but it doesn’t give us much insight into the original implementation. The question is, how can we deconstruct this:

newtype Prob a
    = Prob
    { runProb :: [(a, Rational)]
    }

Eric Kidd -Kidd (2007) pointed out that the monad is the composition of the writer and list monads:

type Prob = WriterT (Product Rational) []

but that seems unsatisfying: in contrast to the tree-based version, we don’t encode any branching structure, we’re able to have empty distributions, and it has the combinatorial explosion problem.

Adding a weighting to nondeterminism is encapsulated more concretely by the ListT transformer. It looks like this:

newtype ListT m a
    = ListT
    { runListT :: m (Maybe (a, ListT m a))
    }

It’s a cons-list, with an effect before every layer¹.

While this can be used to give us the monad we need, I’ve found that something more like this fits the abstraction better:

data ListT m a
    = ListT a (m (Maybe (ListT m a)))

It’s a nonempty list, with the first element exposed. Turns out this is very similar to the cofree comonad:

data Cofree f a = a :< f (Cofree f a)

Just like the initial free encoding, we can start with a primitive operation:

data Perhaps a
    = Impossible
    | WithChance Rational a
    deriving (Functor,Foldable)

And we get all of our instances as well:

newtype Prob a
    = Prob
    { runProb :: Cofree Perhaps a
    } deriving (Functor,Foldable)
    
instance Comonad Prob where
    extract (Prob xs) = extract xs
    duplicate (Prob xs) = Prob (fmap Prob (duplicate xs))

foldProb :: (a -> Rational -> b -> b) -> (a -> b) -> Prob a -> b
foldProb f b = r . runProb
  where
    r (x :< Impossible) = b x
    r (x :< WithChance p xs) = f x p (r xs)

uniform :: [a] -> Prob a
uniform (x:xs) = Prob (coiterW f (EnvT (length xs) (x :| xs)))
  where
    f (EnvT 0 (_ :| [])) = Impossible
    f (EnvT n (_ :| (y:ys))) 
        = WithChance (1 % fromIntegral n) (EnvT (n - 1) (y:|ys))

expect :: (a -> Rational) -> Prob a -> Rational
expect p = foldProb f p
  where
    f x n xs = (p x * n + xs) / (n + 1)

probOf :: (a -> Bool) -> Prob a -> Rational
probOf p = expect (\x -> if p x then 1 else 0)

instance Applicative Prob where
    pure x = Prob (x :< Impossible)
    (<*>) = ap
    
append :: Prob a -> Rational -> Prob a -> Prob a
append = foldProb f (\x y ->  Prob . (x :<) . WithChance y . runProb)
  where
    f e r a p = Prob . (e :<) . WithChance ip . runProb . a op
      where
        ip = p * r / (p + r + 1)
        op = p / (r + 1)

instance Monad Prob where
    xs >>= f = foldProb (append . f) f xs

We see here that we’re talking about gambling-style odds, rather than probability. I wonder if the two representations are dual somehow?

The application of comonads to streams (ListT) has been explored before (Uustalu and Vene 2005); I wonder if there are any insights to be gleaned from this particular probability comonad.

References

Erwig, Martin, and Steve Kollmansberger. 2006. “Functional pearls: Probabilistic functional programming in Haskell.” Journal of Functional Programming 16 (1): 21–34. doi:10.1017/S0956796805005721.

Kidd, Eric. 2007. “Build your own probability monads.”

Larsen, Ken Friis. 2011. “Memory Efficient Implementation of Probability Monads.”

Ścibior, Adam, Zoubin Ghahramani, and Andrew D. Gordon. 2015. “Practical Probabilistic Programming with Monads.” In Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell, 50:165–176. Haskell ’15. New York, NY, USA: ACM. doi:10.1145/2804302.2804317.

Tobin, Jared. 2017. “Implementing the Giry Monad.” jtobin.io.

Uustalu, Tarmo, and Varmo Vene. 2005. “The Essence of Dataflow Programming.” In Proceedings of the Third Asian Conference on Programming Languages and Systems, 2–18. APLAS’05. Berlin, Heidelberg: Springer-Verlag. doi:10.1007/11575467_2.

Note this is not the same as the ListT in transformers; instead it’s a “ListT done right”.↩︎

Scheduling Effects

Donnacha Oisín Kidney — Sat, 23 Jun 2018 00:00:00 UT

Posted on June 23, 2018

Part 4 of a 10-part series on Breadth-First Traversals

Tags: Haskell

After the last post, Noah Easterly pointed me to their tree-traversals library, and in particular the Phases applicative transformer. It allows you to batch applicative effects to be run together: for the breadth-first traversal, we can batch the effects from each level together, giving us a lovely short solution to the problem.

breadthFirst c = runPhasesForwards . go
  where
    go (x:<xs) = liftA2 (:<) (now (c x)) (delay (traverse go xs))

In my efforts to speed this implementation up, I came across a wide and interesting literature on scheduling effects, which I’ll go through a little here.

Coroutines

The first thing that jumps to mind, for me, when I think of “scheduling” is coroutines. These are constructs that let you finely control the order of execution of effects. They’re well explored in Haskell by now, and most libraries will let you do something like the following:

oneThenTwo = do
  liftIO $ print 1
  delay $ liftIO $ print 2

We first print 1, then, after a delay, we print 2. The delay doesn’t make a difference if we just run the whole thing:

>>> retract oneThenTwo
1
2

But you can see its effect when we use the interleave combinator:

>>> retract $ interleave (replicate 3 oneThenTwo)
1
1
1
2
2
2

Hopefully you can see how useful this might be, and the similarity to the Phases construction.

The genealogy of most coroutine libraries in Haskell seems to trace back to Blažević (2011) or Kiselyov (2012): the implementation I have been using in these past few examples (IterT) comes from a slightly different place. Let’s take a quick detour to explore it a little.

Partiality

In functional programming, there are several constructions for modeling error-like states: Maybe for your nulls, Either for your exceptions. What separates these approaches from the “unsafe” variants (null pointers, unchecked exceptions) is that we can prove, in the type system, that the error case is handled correctly.

Conspicuously absent from the usual toolbox for modeling partiality is a way to model nontermination. At first glance, it may seem strange to attempt to do so in Haskell. After all, if I have a function of type:

String -> Int

I can prove that I won’t throw any errors (with Either, that is), because the type Int doesn’t contain Left _. I’ve also proved, miraculously, that I won’t make any null dereferences, because Int also doesn’t contain Nothing. I haven’t proved, however, that I won’t loop infinitely, because (in Haskell), Int absolutely does contain $\bot$ .

So we’re somewhat scuppered. On the other hand, While we can’t prove termination in Haskell, we can:

Model it.
Prove it in something else.

Which is exactly what Venanzio Capretta did in the fascinating (and quite accessible) talk “Partiality is an effect” [Capretta, Altenkirch, and Uustalu (2004)]¹.

The monad in question looks like this:

data Iter a
    = Now a
    | Later (Inf (Iter a))

We’re writing in Idris for the time being, so that we can prove termination and so on. The “recursive call” to Iter is guarded by the Inf type: this turns on a different kind of totality checking in the compiler. Usually, Idris will prevent you from constructing infinite values. But that’s exactly what we want to do here. Take the little-known function until:

until :: (a -> Bool) -> (a -> a) -> a -> a

It’s clearly not necessarily total, and the totality checker will complain as such when we try and implement it directly:

until : (a -> Bool) -> (a -> a) -> a -> a
until p f x = if p x then x else until p f (f x)

But we can use Iter to model that possible totality:

until : (a -> Bool) -> (a -> a) -> a -> Iter a
until p f x = if p x then Now x else Later (until p f (f x))

Of course, nothing’s for free: when we get the ability to construct infinite values, we lose the ability to consume them.

run : Iter a -> a
run (Now x) = x
run (Later x) = run x

We get an error on the run function. However, as you would expect, we can run guarded iteration: iteration up until some finite point.

runUntil : Nat -> Iter a -> Maybe a
runUntil Z _ = Nothing
runUntil (S n) (Now x) = Just x
runUntil (S n) (Later x) = runUntil n x

Making our way back to Haskell, we must first—as is the law—add a type parameter, and upgrade our humble monad to a monad transformer:

newtype IterT m a = IterT { runIterT :: m (Either a (IterT m a)) }

type Iter = IterT Identity

The semantic meaning of the extra m here is interesting: each layer adds not just a recursive step, or a single iteration, but a single effect. Interpreting things in this way gets us back to the original goal:

Scheduling

The Later constructor above can be translated to a delay function on the transformer:

delay = IterT . pure . Right

And using this again, we can write the following incredibly short definition for unfoldTreeM_BF:

unfoldTreeM_BF :: Monad m => (b -> m (a, [b])) -> b -> m (Tree a)
unfoldTreeM_BF f = retract . go
  where
    go b = do
      (x,xs) <- lift (f b)
      fmap (Node x) (interleave (map (delay . go) xs))

Applicative

It would be nice to bring this back to traversals, but alas, IterT is pretty monad-centric. What’s more, if it’s analogous to Phases it certainly doesn’t look like it:

data Phases f a where
  Lift :: f a -> Phases f a
  (:<*>) :: f (a -> b) -> Phases f a -> Phases f b

However, in the documentation for IterT, there’s the following little note:

IterT ~ FreeT Identity

Where FreeT is the free monad transformer. This seems to strongly hint that we could get the same thing for applicatives with ApT. Let’s try it:

newtype Phases f a = Phases
    { runPhases :: ApT Identity f a
    } deriving Functor

The Applicative instance is a little hairy, but it seems correct:

Applicative Instance

Rotations

Donnacha Oisín Kidney — Sun, 03 Jun 2018 00:00:00 UT

Posted on June 3, 2018

Tags: Haskell

This is just some cool-looking stuff I figure out when I was trying to figure out zipper-like algorithms. When I do get around to doing a deep dive on zippers (especially comonadic zippers) I’ll probably be able to write a full post on some of the underlying theory (with maybe some more efficient implementations).

{-# LANGUAGE FlexibleContexts #-}

module Rotations where

import Control.Monad.Tardis
import Control.Applicative ((<**>))

-- | >>> rotations "abcd"
-- ["abcd","bcda","cdab","dabc"]
rotations :: [a] -> [[a]]
rotations = flip evalTardis (id,id) . traverse f
  where
    f x = do
      xs <- pure [] <**> getPast <**> getFuture
      modifyBackwards ((:) x .)
      modifyForwards  (. (:) x)
      pure xs

Breadth-First Traversals in Far Too Much Detail

Donnacha Oisín Kidney — Sun, 03 Jun 2018 00:00:00 UT

Posted on June 3, 2018

Part 3 of a 10-part series on Breadth-First Traversals

Tags: Haskell

After looking at the algorithms I posted last time, I noticed some patterns emerging which I thought deserved a slightly longer post. I’ll go through the problem (Gibbons 2015) in a little more detail, and present some more algorithms to go along with it.

The Problem

The original question was posed by Etian Chatav:

What is the correct way to write breadth first traversal of a [Tree]?

The breadth-first traversal here is a traversal in the lensy sense, i.e:

breadthFirst :: Applicative f => (a -> f b) -> [Tree a] -> f [Tree b]

The Tree type we’re referring to here is a rose tree; we can take the one defined in Data.Tree:

data Tree a
    = Node
    { rootLabel :: a
    , subForest :: [Tree a]
    }

Finally, instead of solving the (somewhat intermediate) problem of traversing a forest, we’ll look directly at traversing the tree itself. In other words, our solution should have the type:

breadthFirst :: Applicative f => (a -> f b) -> Tree a -> f (Tree b)

Breadth-First Enumeration

As in Gibbons (2015), let’s first look at just converting the tree to a list in breadth-first order. In other words, given the tree:

   ┌3
 ┌2┤
 │ └4
1┤
 │ ┌6
 └5┤
   └7

We want the list:

[1,2,5,3,4,6,7]

Last time I looked at this problem, the function I arrived at was as follows:

breadthFirstEnumerate :: Tree a -> [a]
breadthFirstEnumerate ts = f ts b []
  where
    f (Node x xs) fw bw = x : fw (xs : bw)

    b [] = []
    b qs = foldl (foldr f) b qs []

It’s admittedly a little difficult to understand, but it’s really not too complex: we’re popping items off the front of a queue, and pushing the subforest onto the end. fw is the recursive call here: that’s where we send the queue with the element pushed on. Even though it may look like we’re pushing onto the front (as we’re using a cons), this is really the end of the queue, since it’s being consumed in reverse, with foldl.

We can compare it to the technique used in Allison (2006) and Smith (2009), where it’s called corecursive queues. Breadth-first enumeration is accomplished as follows in Smith (2009):

levelOrder :: Tree a -> [a]
levelOrder tr = map rootLabel qs
  where
    qs = enqs [tr] 0 qs

    enqs []     n xs = deq n xs
    enqs (t:ts) n xs = t : enqs  ts (n+1) xs

    deq 0 _      = []
    deq n (x:xs) = enqs (subForest x) (n-1) xs

We get to avoid tracking the length of the queue, however.

Level-Order Enumeration

Before we go the full way to traversal, we can try add a little structure to our breadth-first enumeration, by delimiting between levels in the tree. We want our function to have the following type:

levels :: Tree a -> [[a]]

Looking back at our example tree:

   ┌3
 ┌2┤
 │ └4
1┤
 │ ┌6
 └5┤
   └7

We now want the list:

[[1],[2,5],[3,4,6,7]]

This function is strictly more powerful than breadthFirstEnumerate, as we can define one in terms of the other:

breadthFirstEnumerate = concat . levels

It’s also just a generally useful function, so there are several example implementations available online.

Iterative-Style

The one provided in Data.Tree is as follows:

levels t =
    map (map rootLabel) $
        takeWhile (not . null) $
        iterate (concatMap subForest) [t]

Pretty nice, but it looks to me like it’s doing a lot of redundant work. We could write it as an unfold:

levels t =  unfoldr (f . concat) [[t]]
  where
    f [] = Nothing
    f xs = Just (unzip [(y,ys) | Node y ys <- xs])

The performance danger here lies in unzip: one could potentially optimize that for a speedup.

With an (implicit) Queue

Another definition, in the style of breadthFirstEnumerate above, is as follows:

levels ts = f b ts [] []
  where
    f k (Node x xs) ls qs = k (x : ls) (xs : qs)

    b _ [] = []
    b k qs = k : foldl (foldl f) b qs [] []

Here, we maintain a stack building up the current level, as well as a queue that we send to the next level. Because we’re consing onto the front of the stack, the subforest needs to be traversed in reverse, to build up the output list in the right order. This is why we’re using a second foldl here, whereas the original had foldr on the inner loop.

Zippy-Style

Looking at the implicit queue version, I noticed that it’s just using a church-encoded pair to reverse the direction of the fold. Instead of doing both reversals, we can use a normal pair, and run it in one direction:

levels ts = b (f ts ([],[]))
  where
    f (Node x xs) (ls,qs) = (x:ls,xs:qs)

    b (_,[]) = []
    b (k,qs) = k : b (foldr (flip (foldr f)) ([],[]) qs)

Secondly, we’re running a fold on the second component of the pair: why not run the fold immediately, rather than building the intermediate list. In fact, we’re running a fold over the whole thing, which we can do straight away:

levels ts = f ts []
  where
    f (Node x xs) (q:qs) = (x:q) : foldr f qs xs
    f (Node x xs) []     = [x]   : foldr f [] xs

After looking at it for a while, I realized it’s similar to an inlined version of the algorithm presented in Gibbons (2015):

levels t = [rootLabel t] : foldr (lzw (++)) [] (map levels (subForest t))
  where
    lzw f (x:xs) (y:ys) = f x y : lzw f xs ys
    lzw _ xs [] = xs
    lzw _ [] ys = ys

Cofree

Before going any further, all of the functions so far can be redefined to work on the cofree comonad:

data Cofree f a = a :< f (Cofree f a)

When f is specialized to [], we get the original rose tree. So far, though, all we actually require is Foldable.

From now on, then, we’ll use Cofree instead of Tree.

Traversing

Finally, we can begin on the traversal itself. We know how to execute the effects in the right order, what’s missing is to build the tree back up in the right order.

Filling

First thing we’ll use is a trick with Traversable, where we fill a container from a list. In other words:

fill [(),(),(),()] [1..] = ([1,2,3,4],[5..])

With the state monad (or applicative, in this case, I suppose), we can define a “pop” action, which takes an element from the supply:

pop = state (\(x:xs) -> (x,xs))

And then we traverse that action over our container:

fill = traverse (const pop)

When we use fill, it’ll have the following type:

breadthFirst :: (Applicative f, Traversable t)
             => (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst = ...
  where
    ...
    fill :: t (Cofree t a) -> State [Cofree t b] (t (Cofree t b))
    fill = traverse (const pop)

Hopefully that makes sense: we’re going to get the subforest from here:

data Cofree t a = a :< t (Cofree t a)
                       ^^^^^^^^^^^^^^

And we’re going to fill it with the result of the traversal, which changes the contents from as to bs.

Composing Applicatives

One of the nice things about working with applicatives is that they compose, in a variety of different ways. In other words, if I have one effect, f, and another g, and I want to run them both on the contents of some list, I can do it in one pass, either by layering the effects, or putting them side-by-side.

In our case, we need to deal with two effects: the one generated by the traversal, (the one the caller wants to use), and the internal state we’re using to fill up the forests in our tree. We could use Compose explicitly, but we can avoid some calls to pure if we write the combinators we’re going to use directly:

map2
    :: (Functor f, Functor g)
    => (a -> b -> c) -> f a -> g b -> f (g c)
map2 f x xs =
    fmap (\y -> fmap (f y) xs) x

app2
    :: (Applicative f, Applicative g)
    => (a -> b -> c -> d) -> f a -> g b -> f (g c) -> f (g d)
app2 f x xs =
    liftA2 (\y -> liftA2 (f y) xs) x

The outer applicative (f) will be the user’s effect, the inner will be State.

Take 1: Zippy-Style Traversing

First we’ll try convert the zippy-style levels to a traversal. First, convert the function over to the cofree comonad:

levels tr = f tr []
  where
    f (x:<xs) (q:qs) = (x:q) : foldr f qs xs
    f (x:<xs) []     = [x]   : foldr f [] xs

Next, instead of building up a list of just the root labels, we’ll pair them with the subforests:

breadthFirst tr = f tr []
  where
    f (x:<xs) (q:qs) = ((x,xs):q) : foldr f qs xs
    f (x:<xs) []     = [(x,xs)]   : foldr f [] xs

Next, we’ll fill the subforests:

breadthFirst tr = f tr []
  where
    f (x:<xs) (q:qs) = ((x,fill xs):q) : foldr f qs xs
    f (x:<xs) []     = [(x,fill xs)]   : foldr f [] xs

Then, we can run the applicative effect on the root label:

breadthFirst c tr = f tr []
  where
    f (x:<xs) (q:qs) = ((c x,fill xs):q) : foldr f qs xs
    f (x:<xs) []     = [(c x,fill xs)]   : foldr f [] xs

Now, to combine the effects, we can use the combinators we defined before:

breadthFirst c tr = f tr []
  where
    f (x:<xs) (q:qs) =
        app2 (\y ys zs -> (y:<ys) : zs) (c x) (fill xs) q : foldr f qs xs
    f (x:<xs) [] =
        map2 (\y ys -> [y:<ys]) (c x) (fill xs) : foldr f [] xs

This builds a list containing all of the level-wise traversals of the tree. To collapse them into one, we can use a fold:

breadthFirst :: (Traversable t, Applicative f)
             => (a -> f b)
             -> Cofree t a
             -> f (Cofree t b)
breadthFirst c tr =
    head <$> foldr (liftA2 evalState) (pure []) (f tr [])
  where
    f (x:<xs) (q:qs) =
        app2 (\y ys zs -> (y:<ys):zs) (c x) (fill xs) q : foldr f qs xs
    f (x:<xs) [] =
        map2 (\y ys -> [y:<ys]) (c x) (fill xs) : foldr f [] xs

Take 2: Queue-Based Traversing

Converting the queue-based implementation is easy once we’ve done it with the zippy one. The result is (to my eye) a little easier to read, also:

breadthFirst
    :: (Applicative f, Traversable t)
    => (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c tr =
    fmap head (f b tr e [])
  where
    f k (x:<xs) ls qs =
      k (app2 (\y ys zs -> (y:<ys):zs) (c x) (fill xs) ls) (xs:qs)

    b _ [] = pure []
    b l qs = liftA2 evalState l (foldl (foldl f) b qs e [])

    e = pure (pure [])

There are a couple things to notice here: first, we’re not using map2 anywhere. That’s because in the zippy version we were able to notice when the queue was exhausted, so we could just output the singleton effect. Here, instead, we’re using pure (pure []): this is potentially a source of inefficiency, as liftA2 f (pure x) y is less efficient than fmap (f x) y for some applicatives.

On the other hand, we don’t build up a list of levels to be combined with foldr (liftA2 evalState) at any point: we combine them at every level immediately. You may be able to do the same in the zippy version, but I haven’t figured it out yet.

Yoneda

The final point to make here is to do with the very last thing we do in the traversal: fmap head. Strictly speaking, any fmaps in the code should be unnecessary: we should be able to fuse them all with any call to liftA2. This transformation is often called the “Yoneda embedding”. We can use it here like so:

breadthFirst
    :: ∀ t a f b. (Traversable t, Applicative f)
    => (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c tr = f (b head) tr e []
  where
    f k (x:<xs) ls qs =
        k (app2 (\y ys zs -> (y:<ys) : zs) (c x) (fill xs) ls) (xs : qs)

    b :: ∀ x. ([Cofree t b] -> x)
      -> f (State [Cofree t b] [Cofree t b])
      -> [t (Cofree t a)]
      -> f x
    b k _ [] = pure (k [])
    b k l qs =
        liftA2 (\x -> k . evalState x) l (foldl (foldl f) (b id) qs e [])

    e = pure (pure [])

Notice that we need scoped type variables here, since the type of b changes depending on when it’s called.

Take 3: Iterative Traversing

Transforming the iterative version is slightly different from the other two:

breadthFirst c tr = fmap head (go [tr])
  where
    go [] = pure []
    go xs =
        liftA2
            evalState
            (getCompose (traverse f xs))
            (go (foldr (\(_:<ys) b -> foldr (:) b ys) [] xs))
    f (x:<xs) = Compose (map2 (:<) (c x) (fill xs))

We’re using Compose directly here, in contrast to the other two algorithms.

Comparison

Performance-wise, no one algorithm wins out in every case. For enumeration, the zippy algorithm is the fastest in most cases—except when the tree had a large branching factor; then, the iterative algorithm wins out. For the traversals, the iterative algorithm is usually better—except for monads with more expensive applicative instances.

I’m still not convinced that the zippy traversal is as optimized as it could be, however. If anyone has a better implementation, I’d love to see it!

Fusion

Using the composability of applicatives, we can fuse several operations over traversables into one pass. Unfortunately, however, this can often introduce a memory overhead that makes the whole operation slower overall. One such example is the iterative algorithm above:

breadthFirst c tr = fmap head (go [tr])
  where
    go [] = pure []
    go xs = liftA2 evalState zs (go (ys []))
      where
        Compose (Endo ys,Compose zs) = traverse f xs
    f (x :< xs) =
        Compose
            (Endo (flip (foldr (:)) xs)
            ,Compose (map2 (:<) (c x) (fill xs)))

We only traverse the subforest of each node once now, fusing the fill operation with building the list to send to the recursive call. This is expensive (especially memory-wise), though, and traversing the descendant is cheap; the result is that the one-pass version is slower (in my tests).

Generalizing

The cofree comonad allows us to generalize over the type of “descendants”—from lists (in Tree) to anything traversable. We could also generalize over the type of the traversal itself: given a way to access the descendants of a node, we should be able to traverse all nodes in a breadth-first order. This kind of thing is usually accomplished by Plated: it’s a class that gives you a traversal over the immediate descendants of some recursive type. Adapting the iterative version is relatively simple:

breadthFirstOf :: Traversal' a a -> Traversal' a a
breadthFirstOf trav c tr = fmap head (go [tr])
  where
    go [] = pure []
    go xs =
        liftA2
            evalState
            (getCompose (traverse f xs))
            (go (foldr (\ys b -> foldrOf trav (:) b ys) [] xs))
    f xs = Compose (fmap fill (c xs))
    fill = trav (const (State (\(x:xs) -> (x, xs))))

We can use this version to get back some of the old functions above:

breadthFirstEnumerate ::  Traversable f => Cofree f a -> [a]
breadthFirstEnumerate = toListOf (breadthFirstOf plate . _extract)

Unfolding

Building a tree breadth-first, monadically, is still an unsolved problem (it looks like: Feuer 2015).

Using some of these we can implement a monadic breadth-first unfold for the cofree comonad:

unfoldM :: (Monad m, Traversable t)
        => (b -> m (a, t b))
        -> b
        -> m (Cofree t a)
unfoldM c tr = go head [tr]
  where
    go k [] = pure (k [])
    go k xs = do
        ys <- traverse c xs
        go (k . evalState (traverse f ys)) (toList (Compose (Compose ys)))
    f (x,xs) = fmap (x:<) (fill xs)

References

Feuer, David. 2015. “Is a lazy, breadth-first monadic rose tree unfold possible?” Question. Stack Overflow. https://stackoverflow.com/q/27748526.

Gibbons, Jeremy. 2015. “Breadth-First Traversal.” Patterns in Functional Programming. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Smith, Leon P. 2009. “Lloyd Allison’s Corecursive Queues: Why Continuations Matter.” The Monad.Reader 14 (14) (July): 28. https://meldingmonads.files.wordpress.com/2009/06/corecqueues.pdf.

Breadth-First Rose Trees: Traversals and the Cofree Comonad

Donnacha Oisín Kidney — Fri, 01 Jun 2018 00:00:00 UT

Posted on June 1, 2018

Part 2 of a 10-part series on Breadth-First Traversals

Tags: Haskell

I was looking again at the issue of writing breadth-first traversals for rose trees, and in particular the problem explored in Gibbons (2015). The breadth-first traversal here is a traversal in the lensy sense.

First, let’s look back at getting the levels out of the tree. Here’s the old function I arrived at last time:

levels :: Forest a -> [[a]]
levels ts = foldl f b ts [] []
  where
    f k (Node x xs) ls qs = k (x : ls) (xs : qs)

    b _ [] = []
    b k qs = k : foldl (foldl f) b qs [] []

After wrangling the definition a little, I got to the following (much cleaner) definition:

levels :: Tree a -> [[a]]
levels tr = f tr [] where
  f (Node x xs) (y:ys) = (x:y) : foldr f ys xs
  f (Node x xs) []     = [x]   : foldr f [] xs

Cofree

Before going any further, all of the functions so far can be redefined to work on the cofree comonad:

data Cofree f a = a :< f (Cofree f a)

When f is specialized to [], we get the original rose tree. But what we actually require is much less specific: levels, for instance, only needs Foldable.

levelsCofree :: Foldable f => Cofree f a -> [[a]]
levelsCofree tr = f tr []
  where
    f (x:<xs) (y:ys) = (x:y) : foldr f ys xs
    f (x:<xs) []     = [x]   : foldr f [] xs

Using this, we can write the efficient breadth-first traversal:

breadthFirst
    :: (Applicative f, Traversable t)
    => (a -> f b) -> Cofree t a -> f (Cofree t b)
breadthFirst c (t:<ts) =
    liftA2 evalState (map2 (:<) (c t) (fill ts)) chld
  where
    chld = foldr (liftA2 evalState) (pure []) (foldr f [] ts)
    fill = traverse (const (state (\(x:xs) -> (x,xs))))

    f (x:<xs) (q:qs)
        = app2 (\y ys zs -> (y:<ys) : zs) (c x) (fill xs) q
        : foldr f qs xs
    f (x:<xs) []
        = map2 (\y ys -> [y:<ys]) (c x) (fill xs)
        : foldr f [] xs

    map2 k x xs = fmap   (\y -> fmap   (k y) xs) x
    app2 k x xs = liftA2 (\y -> liftA2 (k y) xs) x

At every level, the subforest’s shape it taken (fill), and it’s traversed recursively. We can fuse these two steps into one:

breadthFirst
    :: (Traversable t, Applicative f)
    => (a -> f b) -> Cofree t a  -> f (Cofree t b)
breadthFirst c (t:<ts) =
    liftA2
        evalState
        (map2 (:<) (c t) fill)
        (foldr (liftA2 evalState) (pure []) (chld []))
  where
    Compose (Endo chld,fill) = go ts

    go = traverse (\x -> Compose (Endo (f x), state (\(y:ys) -> (y,ys))))

    f (x:<xs) (q:qs) = app2 (\y ys zs -> (y:<ys) : zs) (c x) r q : rs qs
      where Compose (Endo rs,r) = go xs
    f (x:<xs) [] = map2 (\y ys -> [y:<ys]) (c x) r : rs []
      where Compose (Endo rs,r) = go xs

    map2 k x xs = fmap   (\y -> fmap   (k y) xs) x
    app2 k x xs = liftA2 (\y -> liftA2 (k y) xs) x

The overhead from this approach scraps any benefit, though.

Gibbons, Jeremy. 2015. “Breadth-First Traversal.” Patterns in Functional Programming. https://patternsinfp.wordpress.com/2015/03/05/breadth-first-traversal/.

Swapping

Donnacha Oisín Kidney — Wed, 30 May 2018 00:00:00 UT

Posted on May 30, 2018

Tags: Haskell

{-# LANGUAGE RecursiveDo #-}

module Swap where

import qualified Data.Map.Strict as Map
import           Data.Map.Strict   (Map)

import           Data.IntMap          (IntMap)
import qualified Data.IntMap.Strict as IntMap
import qualified Data.IntMap.Lazy   as LazyIntMap

import           Control.Lens

import           Control.Arrow           ((&&&))
import           Control.Monad           ((>=>))
import           Control.Monad.Fix       (mfix)

import           Control.Monad.State     (StateT(..),execState,state)

import           Data.Maybe  (fromMaybe)
import           Data.Monoid (First(..))

Say you want to swap two items in a mapping structure—Data.Map.Strict, Data.HashMap, etc. The normal way uses far too many operations:

-- |
-- >>> swapAt4 1 2 (Map.fromList (zip [1..5] ['a'..]))
-- fromList [(1,'b'),(2,'a'),(3,'c'),(4,'d'),(5,'e')]
swapAt4 :: Ord a => a -> a -> Map a b -> Map a b
swapAt4 i j xs = case Map.lookup i xs of
  Nothing -> xs
  Just x -> case Map.lookup j xs of
    Nothing -> xs
    Just y -> Map.insert i y (Map.insert j x xs)

Two lookups, and two insertions. We can cut it down to three operations with insertLookupWithKey:

-- |
-- >>> swapAt3 1 2 (Map.fromList (zip [1..5] ['a'..]))
-- fromList [(1,'b'),(2,'a'),(3,'c'),(4,'d'),(5,'e')]
swapAt3 :: Ord a => a -> a -> Map a b -> Map a b
swapAt3 i j xs = case Map.lookup i xs of
  Nothing -> xs
  Just x -> case Map.insertLookupWithKey (const const) j x xs of
    (Nothing,_) -> xs
    (Just y,ys) -> Map.insert i y ys

Then, using laziness, we can write the above program circularly, reducing the number of lookups to 2:

swapAt2 :: Ord a => a -> a -> Map a b -> Map a b
swapAt2 i j xs = zs
  where
     (ival,ys) = Map.updateLookupWithKey (replace jval) i xs
     (jval,zs) = Map.updateLookupWithKey (replace ival) j ys
     replace x = const (Just . (`fromMaybe` x))

Unfortunately, Data.Map isn’t lazy enough for this: the above won’t terminate. Interestingly, Data.IntMap is lazy enough:

swapAt2Int :: Int -> Int -> IntMap a -> IntMap a
swapAt2Int i j xs = zs
  where
    (ival,ys) = LazyIntMap.updateLookupWithKey (replace jval) i xs
    (jval,zs) =     IntMap.updateLookupWithKey (replace ival) j ys
    replace x = const (Just . (`fromMaybe` x))

Notice how we have to use the lazy version of updateLookupWithKey. Again, though, this version has a problem: it won’t terminate when one of the keys is missing.

Thankfully, both of our problems can be solved by abstracting a little and using Ixed from lens:

-- |
-- >>> swapIx 1 2 "abc"
-- "acb"
swapIx :: Ixed a => Index a -> Index a -> a -> a
swapIx i j xs = zs
  where
    (First ival, ys) = ix i (replace jval) xs
    (First jval, zs) = ix j (replace ival) ys
    replace x = First . Just &&& (`fromMaybe` x)

Because ix is a traversal, it won’t do anything when there’s a missing key, which is what we want. Also, it adds extra laziness, as the caller of a traversal gets certain extra controls over the strictness of the traversal.

You may notice the stateful pattern above. However, translating it over as-is presents a problem: the circular bindings won’t work in vanilla do notation. For that, we need MonadFix and Recursive Do:

swapSt :: Ixed a => Index a -> Index a -> a -> a
swapSt i j = execState $ mdo
    ival <- replace i jval
    jval <- replace j ival
    pure ()
  where
    replace i (First x) =
        state (ix i (First . Just &&& (`fromMaybe` x)))

Finally, we can use mfix directly, and we’ll get the following clean-looking solution:

swap :: Ixed a => Index a -> Index a -> a -> a
swap i j = execState (mfix (replace i >=> replace j))
  where
    replace i (First x) =
        state (ix i (First . Just &&& (`fromMaybe` x)))

This works for most containers, even strict ones like Data.Map.Strict. It also works for Data.Vector. It does not work for Data.Vector.Unboxed, though.

Sorting Small Things in Haskell

Donnacha Oisín Kidney — Sun, 06 May 2018 00:00:00 UT

Posted on May 6, 2018

Part 1 of a 1-part series on Sorting

Tags: Haskell, Algorithms

I was working on some performance-intensive stuff recently, and I ran into the issue of sorting very small amounts of values (think 3, 4, 5).

The standard way to do this is with sorting networks. The way I’ll be using doesn’t actually perform any parallelism (unfortunately), but it is a clean way to write the networks in Haskell without too much repetition.

This website will generate an optimal sorting network for your given size, and the output (for 3) looks like this:

[[1,2]]
[[0,2]]
[[0,1]]

Each pair of indices represents a “compare-and-swap” operation: so the first line means “compare the value at 1 to the value at 2: if it’s bigger, swap them”. For 5, the network looks like this:

[[0,1],[2,3]]
[[0,2],[1,3]]
[[1,2],[0,4]]
[[1,4]]
[[2,4]]
[[3,4]]

Pairs on the same line can be performed in parallel.

For our case, I’m going to be looking at sorting tuples, but the technique can easily be generalized to vectors, etc.

The first trick is to figure out how to do “swapping”: we don’t want mutation, so what we can do instead is swap the reference to some value, by shadowing its name. In other words:

swap2 :: (a -> a -> Bool) -> a -> a -> (a, a)
swap2 lte x y | lte x y = (x, y)
              | otherwise = (y, x)

sort3 :: (a -> a -> Bool) -> (a,a,a) -> (a,a,a)
sort3 lte (_0,_1,_2)
    = case swap2 lte _1 _2 of
      (_1, _2) -> case swap2 lte _0 _2 of
        (_0, _2) -> case swap2 lte _0 _1 of
          (_0, _1) -> (_0, _1, _2)

The indentation is hard to read, though, and wrapping-and-unwrapping tuples makes me nervous about the performance (although it may be inlined). The next step is to church-encode the pairs returned:

swap2 :: (a -> a -> Bool) -> a -> a -> (a -> a -> b) -> b
swap2 lte x y k
    | lte x y = k x y
    | otherwise = k y x

sort3 :: (a -> a -> Bool) -> (a,a,a) -> (a,a,a)
sort3 lte (_0,_1,_2)
    = swap2 lte _1 _2 $ \ _1 _2 ->
      swap2 lte _0 _2 $ \ _0 _2 ->
      swap2 lte _0 _1 $ \ _0 _1 ->
      (_0,_1,_2)

Then, to get this to compile down to efficient code, we can make judicious use of inline from GHC.Exts:

import GHC.Exts (inline)

swap2 :: (a -> a -> Bool) -> a -> a -> (a -> a -> b) -> b
swap2 lte x y k
    | inline lte x y = inline k x y
    | otherwise = inline k y x
{-# INLINE swap2 #-}

sort3 :: (a -> a -> Bool) -> (a, a, a) -> (a, a, a)
sort3 lte (_0,_1,_2)
    = swap2 lte _1 _2 $ \ _1 _2 ->
      swap2 lte _0 _2 $ \ _0 _2 ->
      swap2 lte _0 _1 $ \ _0 _1 ->
      (_0,_1,_2)
{-# INLINE sort3 #-}

And to see if this really does make efficient code, let’s look at the core (cleaned up):

sort3
  = \ (lte :: a -> a -> Bool)
      (ds :: (a, a, a)) ->
      case ds of wild_X8 (_0, _1, _2) ->
      case lte _1 _2 of
        False ->
          case lte _0 _1 of
            False -> (_2, _1, _0)
            True ->
              case lte _0 _2 of
                False -> (_2, _0, _1)
                True -> (_0, _2, _1)
        True ->
          case lte _0 _2 of
            False ->
              case lte _2 _1 of
                False -> (_1, _2, _0)
                True -> (_2, _1, _0)
            True ->
              case lte _0 _1 of
                False -> (_1, _0, _2)
                True -> wild_X8

Fantastic! When we specialize to Int, we get all of the proper unpacking:

sort3Int :: (Int, Int, Int) -> (Int, Int, Int)
sort3Int = inline sort3 (<=)

Core (with just the variable names cleaned up this time):

sort3Int
  = \ (w :: (Int, Int, Int)) ->
      case w of w_X { (_0, _1, _2) ->
      case _0 of w_0 { GHC.Types.I# _0U ->
      case _1 of w_1 { GHC.Types.I# _1U ->
      case _2 of w_2 { GHC.Types.I# _2U ->
      case GHC.Prim.<=# _1U _2U of {
        __DEFAULT ->
          case GHC.Prim.<=# _0U _1U of {
            __DEFAULT -> (w_2, w_1, w_0);
            1# ->
              case GHC.Prim.<=# _0U _2U of {
                __DEFAULT -> (w_2, w_0, w_1);
                1# -> (w_0, w_2, w_1)
              }
          };
        1# ->
          case GHC.Prim.<=# _0U _2U of {
            __DEFAULT ->
              case GHC.Prim.<=# _2U _1U of {
                __DEFAULT -> (w_1, w_2, w_0);
                1# -> (w_2, w_1, w_0)
              };
            1# ->
              case GHC.Prim.<=# _0U _1U of {
                __DEFAULT -> (w_1, w_0, w_2);
                1# -> w_X
              }
          }
      }
      }
      }
      }
      }

Now, for the real test: sorting 5-tuples, using the network above.

sort5 :: (a -> a -> Bool) -> (a,a,a,a,a) -> (a,a,a,a,a)
sort5 lte (_0,_1,_2,_3,_4)
    = swap2 lte _0 _1 $ \ _0 _1 ->
      swap2 lte _2 _3 $ \ _2 _3 ->
      swap2 lte _0 _2 $ \ _0 _2 ->
      swap2 lte _1 _3 $ \ _1 _3 ->
      swap2 lte _1 _2 $ \ _1 _2 ->
      swap2 lte _0 _4 $ \ _0 _4 ->
      swap2 lte _1 _4 $ \ _1 _4 ->
      swap2 lte _2 _4 $ \ _2 _4 ->
      swap2 lte _3 _4 $ \ _3 _4 ->
      (_0,_1,_2,_3,_4)
{-# INLINE sort5 #-}

The core output from this is over 1000 lines long: you can see it (with the variable names cleaned up) here.

In my benchmarks, these functions are indeed quicker than their counterparts in vector, but I’m not confident in my knowledge of Haskell performance to make much of a strong statement about them.

Type-Level Induction in Haskell

Donnacha Oisín Kidney — Sat, 05 May 2018 00:00:00 UT

Posted on May 5, 2018

Tags: Haskell, Dependent Types

The code from this post is available as a gist.

One of the most basic tools for use in type-level programming is the Peano definition of the natural numbers:

data ℕ
    = Z
    | S ℕ

Using the new TypeFamilyDependencies extension, these numbers can be used to describe the “size” of some type. I’m going to use the proportion symbol here:

type family (t ∷ k) ∝ (n ∷ ℕ) = (a ∷ Type) | a → t n k

Using this type family we can describe induction on the natural numbers:

class Finite n where
    induction ∷ t ∝ Z → (∀ k. t ∝ k → t ∝ S k) → t ∝ n

instance Finite Z where
    induction z _ = z
    {-# inline induction #-}

instance Finite n ⇒ Finite (S n) where
    induction z s = s (induction z s)
    {-# inline induction #-}

The induction function reads as the standard mathematical definition of induction: given a proof (value) of the zero case, and a proof that any proof is true for its successor, we can give you a proof of the case for any finite number.

An added bonus here is that the size of something can usually be resolved at compile-time, so any inductive function on it should also be resolved at compile time.

We can use it to provide the standard instances for basic length-indexed lists:

infixr 5 :-
data List n a where
        Nil  ∷ List Z a
        (:-) ∷ a → List n a → List (S n) a

Some instances for those lists are easy:

instance Functor (List n) where
    fmap _ Nil = Nil
    fmap f (x :- xs) = f x :- fmap f xs

However, for Applicative, we need some way to recurse on the size of the list. This is where induction comes in.

type instance '(List,a) ∝ n = List n a

This lets us write pure in a pleasingly simple way:

instance Finite n ⇒
         Applicative (List n) where
    pure x = induction Nil (x :-)

But can we also write <*> using induction? Yes! Because we’ve factored out the induction itself, we just need to describe the notion of a “sized” function:

data a ↦ b
type instance ((x ∷ a) ↦ (y ∷ b)) ∝ n = (x ∝ n) → (y ∝ n)

Then we can write <*> as so:

instance Finite n ⇒
         Applicative (List n) where
    pure x = induction Nil (x :-)
    (<*>) =
        induction
            (\Nil Nil → Nil)
            (\k (f :- fs) (x :- xs) → f x :- k fs xs)

What about the Monad instance? For that, we need a little bit of plumbing: the type signature of >>= is:

(>>=) ∷ m a → (a → m b) → m b

One of the parameters (the second a) doesn’t have a size: we’ll need to work around that, with Const:

type instance (Const a ∷ ℕ → Type) ∝ n = Const a n

Using this, we can write our Monad instance:

head' ∷ List (S n) a → a
head' (x :- _) = x

tail' ∷ List (S n) a → List n a
tail' (_ :- xs) = xs

instance Finite n ⇒
         Monad (List n) where
    xs >>= (f ∷ a → List n b) =
        induction
            (\Nil _ → Nil)
            (\k (y :- ys) fn → head' (fn (Const y)) :-
                               k ys (tail' . fn . Const . getConst))
            xs
            (f . getConst ∷ Const a n → List n b)

Type Family Dependencies

Getting the above to work actually took a surprising amount of work: the crux is that the ∝ type family needs to be injective, so the “successor” proof can typecheck. Unfortunately, this means that every type can only have one notion of “size”. What I’d prefer is to be able to pass in a function indicating exactly how to get the size out of a type, that could change depending on the situation. So we could recurse on the first argument of a function, for instance, or just its second, or just the result. This would need either type-level lambdas (which would be cool), or generalized type family dependencies.

5 Cool Things You Can Do With Pattern Synonyms

Donnacha Oisín Kidney — Thu, 12 Apr 2018 00:00:00 UT

Posted on April 12, 2018

Tags: Haskell, Pattern Synonyms

Pattern Synonyms is an excellent extension for Haskell. There are some very cool examples of their use out there, and I thought I’d add to the list.

Make Things Look Like Lists

Lists are the fundamental data structure for functional programmers. Unfortunately, once more specialized structures are required, you often have to switch over to an uncomfortable, annoying API which isn’t as pleasant or fun to use as cons and nil. With pattern synonyms, though, that’s not so! For instance, here’s how you would do it with a run-length-encoded list:

data List a
    = Nil
    | ConsN {-# UNPACK #-} !Int
            a
            (List a)

cons :: Eq a => a -> List a -> List a
cons x (ConsN i y ys)
  | x == y = ConsN (i+1) y ys
cons x xs = ConsN 1 x xs

uncons :: List a -> Maybe (a, List a)
uncons Nil = Nothing
uncons (ConsN 1 x xs) = Just (x, xs)
uncons (ConsN n x xs) = Just (x, ConsN (n-1) x xs)

infixr 5 :-
pattern (:-) :: Eq a => a -> List a -> List a
pattern x :- xs <- (uncons -> Just (x, xs))
  where
    x :- xs = cons x xs
{-# COMPLETE Nil, (:-) #-}

zip :: List a -> List b -> List (a,b)
zip (x :- xs) (y :- ys) = (x,y) :- zip xs ys
zip _ _ = Nil

A little more useful would be to do the same with a heap:

data Tree a
    = Leaf
    | Node a (Tree a) (Tree a)

smerge :: Ord a => Tree a -> Tree a -> Tree a
smerge Leaf ys = ys
smerge xs Leaf = xs
smerge h1@(Node x lx rx) h2@(Node y ly ry)
  | x <= y    = Node x (smerge h2 rx) lx
  | otherwise = Node y (smerge h1 ry) ly

cons :: Ord a => a -> Tree a -> Tree a
cons x = smerge (Node x Leaf Leaf)

uncons :: Ord a => Tree a -> Maybe (a, Tree a)
uncons Leaf = Nothing
uncons (Node x l r) = Just (x, smerge l r)

infixr 5 :-
pattern (:-) :: Ord a => a -> Tree a -> Tree a
pattern x :- xs <- (uncons -> Just (x, xs))
  where
    x :- xs = cons x xs
{-# COMPLETE Leaf, (:-) #-}

sort :: Ord a => [a] -> [a]
sort = go . foldr (:-) Leaf
  where
    go Leaf = []
    go (x :- xs) = x : go xs

In fact, this pattern can be generalized, so any container-like-thing with a cons-like-thing can be modified as you would with lists. You can see the generalization in lens.

Retroactively Make LYAH Examples Work

One of the most confusing things I remember about learning Haskell early-on was that the vast majority of the Monads examples didn’t work, because they were written pre-transformers. In other words, the state monad was defined like so:

newtype State s a = State { runState :: s -> (a, s) }

But in transformers nowadays (which is where you get State from if you import it in the normal way), the definition is:

newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }

type State s = StateT s Identity

This results in some very confusing error messages when you try run example code.

However, we can pretend that the change never happened, with a simple pattern synonym:

newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }

type State s = StateT s Identity

runState :: State s a -> s -> (a, s)
runState xs = runIdentity . runStateT xs

pattern State :: (s -> (a, s)) -> State s a
pattern State x <- (runState -> x)
  where
    State x = StateT (Identity . x)

Getting Type-Level Numbers With an Efficient Runtime Representation

If you want to write type-level proofs on numbers, you’ll probably end up using Peano numerals and singletons:

data Nat = Z | S Nat

data Natty n where
  Zy :: Natty Z
  Sy :: Natty n -> Natty (S n)

type family (+) (n :: Nat) (m :: Nat) :: Nat where
  Z + m = m
  S n + m = S (n + m)

plusZeroIsZero :: Natty n -> n + Z :~: n
plusZeroIsZero Zy = Refl
plusZeroIsZero (Sy n) = case plusZeroIsZero n of
  Refl -> Refl

Pretty cool, right? We can even erase the proof (if we really trust it) using rewrite rules:

{-# RULES 
"plusZeroIsZero" forall n. plusZeroIsZero n = unsafeCoerce Refl
#-}

This isn’t ideal, but it’s getting there.

However, if we ever want to use these things at runtime (perhaps as a type-level indication of some data structure’s size), we’re going to rely on the value-level Peano addition, which is bad news.

Not so with pattern synonyms!

data family The k :: k -> Type

class Sing (a :: k) where sing :: The k (a :: k)

data Nat = Z | S Nat

newtype instance The Nat n = NatSing Natural

instance Sing Z where
    sing = NatSing 0

instance Sing n => Sing (S n) where
    sing =
        (coerce :: (Natural -> Natural) -> (The Nat n -> The Nat (S n)))
            succ sing

data Natty n where
        ZZy :: Natty Z
        SSy :: The Nat n -> Natty (S n)

getNatty :: The Nat n -> Natty n
getNatty (NatSing n :: The Nat n) = case n of
  0 -> gcastWith (unsafeCoerce Refl :: n :~: Z) ZZy
  _ -> gcastWith (unsafeCoerce Refl :: n :~: S m) (SSy (NatSing (pred n)))

pattern Zy :: () => (n ~ Z) => The Nat n
pattern Zy <- (getNatty -> ZZy) where Zy = NatSing 0

pattern Sy :: () => (n ~ S m) => The Nat m -> The Nat n
pattern Sy x <- (getNatty -> SSy x) where Sy (NatSing x) = NatSing (succ x)
{-# COMPLETE Zy, Sy #-}

type family (+) (n :: Nat) (m :: Nat) :: Nat where
        Z + m = m
        S n + m = S (n + m)

-- | Efficient addition, with type-level proof.
add :: The Nat n -> The Nat m -> The Nat (n + m)
add = (coerce :: (Natural -> Natural -> Natural)
              -> The Nat n -> The Nat m -> The Nat (n + m)) (+)

-- | Proof on efficient representation.
addZeroRight :: The Nat n -> n + Z :~: n
addZeroRight Zy = Refl
addZeroRight (Sy n) = gcastWith (addZeroRight n) Refl

(unfortunately, incomplete pattern warnings don’t work here)

Hide Your Implementations

So you’ve got a tree type:

data Tree a
    = Tip
    | Bin a (Tree a) (Tree a)

And you’ve spent some time writing a (reasonably difficult) function on the tree:

Complicated function on the tree

Strict Applicative Transformer

Donnacha Oisín Kidney — Wed, 21 Mar 2018 00:00:00 UT

Posted on March 21, 2018

Tags: Haskell

Adapted from this post on reddit. It’s possible to take a lazy traversal and make it strict.

{-# LANGUAGE BangPatterns #-}

module Seq (fmap',traverse') where

import Data.Coerce
import Control.Applicative (liftA2)

newtype Seq a = Seq { unSeq :: a }

instance Functor Seq where
  fmap f x = let !vx = unSeq x in Seq (f vx)
  {-# INLINE fmap #-}
  x <$ xs = let !_ = unSeq xs in Seq x
  {-# INLINE (<$) #-}

instance Applicative Seq where
  pure = Seq
  {-# INLINE pure #-}
  fs <*> xs = let !vx = unSeq xs in Seq (unSeq fs vx)
  {-# INLINE (<*>) #-}
  xs *> ys = let !_ = unSeq xs in ys
  {-# INLINE (*>) #-}
  xs <* ys = let !_ = unSeq ys in xs
  {-# INLINE (<*) #-}

fmap' :: Traversable f => (a -> b) -> f a -> f b
fmap' = (coerce :: ((a -> Seq b) -> f a -> Seq (f b)) -> (a -> b) -> f a -> f b) traverse
{-# INLINE fmap' #-}

newtype SeqT f a = SeqT { unSeqT :: f a }

instance Functor f => Functor (SeqT f) where
  fmap f = SeqT #. fmap (\ !vx -> f vx) .# unSeqT
  {-# INLINE fmap #-}

(#.) :: Coercible b c => (b -> c) -> (a -> b) -> a -> c
(#.) _ = coerce
{-# INLINE (#.) #-}

(.#) :: Coercible a b => (b -> c) -> (a -> b) -> a -> c
(.#) f _ = coerce f
{-# INLINE (.#) #-}

instance Applicative f => Applicative (SeqT f) where
  pure = SeqT #. pure
  {-# INLINE pure #-}
  (<*>) = (coerce :: (f (a -> b) -> f a -> f b) -> (SeqT f (a -> b) -> SeqT f a -> SeqT f b)) (liftA2 (\fs !vx -> fs vx))
  {-# INLINE (<*>) #-}
  liftA2 f = (coerce :: (f a -> f b -> f c) -> (SeqT f a -> SeqT f b -> SeqT f c)) (liftA2 (\ !x !y -> f x y))
  {-# INLINE liftA2 #-}

traverse' :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
traverse' = (coerce :: ((a -> SeqT f b) -> t a -> SeqT f (t b)) -> (a -> f b) -> t a -> f (t b)) traverse
{-# INLINE traverse' #-}

You need traversable in order to get the strictness: there’s a similar way to get a stricter fmap with monad instead:

(<$!>) :: Monad m => (a -> b) -> m a -> m b
{-# INLINE (<$!>) #-}
f <$!> m = do
  x <- m
  let z = f x
  z `seq` return z

Countdown

Donnacha Oisín Kidney — Tue, 20 Mar 2018 00:00:00 UT

Posted on March 20, 2018

Tags: Haskell, Algorithms

There’s a popular UK TV show called Countdown with a round where contestants have to get as close to some target number as possible by constructing an arithmetic expression from six random numbers.

You don’t have to use all of the numbers, and you’re allowed use four operations: addition, subtraction, multiplication, and division. Additionally, each stage of the calculation must result in a positive integer.

Here’s an example. Try get to the target 586:

$100,25,1,5,3,10$

On the show, contestants get 30 seconds to think of an answer.

Solution

Convolutions

Donnacha Oisín Kidney — Mon, 19 Mar 2018 00:00:00 UT

Posted on March 19, 2018

Tags: Haskell

Convolutions of a list give a different traversal order than what you would traditionally expect. Adapted from here.

-- | >>> mapM_ print ([1..5] <.> [1..5])
-- [(1,1)]
-- [(1,2),(2,1)]
-- [(1,3),(2,2),(3,1)]
-- [(1,4),(2,3),(3,2),(4,1)]
-- [(1,5),(2,4),(3,3),(4,2),(5,1)]
-- [(2,5),(3,4),(4,3),(5,2)]
-- [(3,5),(4,4),(5,3)]
-- [(4,5),(5,4)]
-- [(5,5)]
(<.>) :: [a] -> [b] -> [[(a,b)]]
xs <.> ys = foldr f [] xs where
  f x = foldr g id ys . ([] :) where
    g y k ~(z :~ zs) = ((x,y) : z) : k zs

unconsMon :: Monoid m => [m] -> (m, [m])
unconsMon (x:xs) = (x, xs)
unconsMon []     = (mempty, [])

pattern (:~) :: Monoid m => m -> [m] -> [m]
pattern (:~) x xs <- (unconsMon -> (x, xs))

Rose Trees, Breadth-First

Donnacha Oisín Kidney — Sat, 17 Mar 2018 00:00:00 UT

Posted on March 17, 2018

Part 1 of a 10-part series on Breadth-First Traversals

Tags: Haskell

In contrast to the more common binary trees, in a rose tree every node can have any number of children.

data Tree a
    = Node
    { root   :: a
    , forest :: Forest a
    }

type Forest a = [Tree a]

One of the important manipulations of this data structure, which forms the basis for several other algorithms, is a breadth-first traversal. I’d like to go through a couple of techniques for implementing it, and how more generally you can often get away with using much simpler data structures if you really pinpoint the API you need from them.

As a general technique, Okasaki (2000) advises that a queue be used:

breadthFirst :: Tree a -> [a]
breadthFirst tr = go (singleton tr)
  where
    go q = case pop q of
      Nothing -> []
      Just (Node x xs,qs) -> x : go (qs `append` xs)

There are three functions left undefined there: singleton, pop, and append. They represent the API of our as-of-yet unimplemented queue, and their complexity will dictate the complexity of the overall algorithm. As a (bad) first choice, we could use simple lists, with the functions defined thus:

singleton x = [x]
pop (x:xs) = Just (x,xs)
pop [] = Nothing
append = (++)

Those repeated appends are bad news. The queue needs to be able to support popping from one side and appending from the other, which is something lists absolutely cannot do well.

We could swap in a more general queue implementation, possibly using Data.Sequence, or a pair of lists. But these are more complex and general than we need, so let’s try and pare down the requirements a little more.

First, we don’t need a pop: the go function can be expressed as a fold instead. Second, we don’t need every append to be immediately stuck into the queue, we can batch them, first appending to a structure that’s efficient for appends, and then converting that to a structure which is efficient for folds. In code:

breadthFirst :: Forest a -> [a]
breadthFirst ts = foldr f b ts []
  where
    f (Node x xs) fw bw = x : fw (xs : bw)

    b [] = []
    b qs = foldl (foldr f) b qs []

We’re consing instead of appending, but the consumption is being done in the correct direction anyway, because of the foldl.

Levels

So next step: to get the levels function from Data.Tree. Instead of doing a breadth-first traversal, it returns the nodes at each level of the tree. Conceptually, every time we did the reverse above (called foldl), we will do a cons as well:

levels :: Forest a -> [[a]]
levels ts = foldl f b ts [] []
  where
    f k (Node x xs) ls qs = k (x : ls) (xs : qs)

    b _ [] = []
    b k qs = k : foldl (foldl f) b qs [] []

Unfolding

The original reason I started work on these problems was this issue in containers. It concerns the unfoldTreeM_BF function. An early go at rewriting it, inspired by levels above, looks like this:

unfoldForestM_BF :: Monad m => (b -> m (a, [b])) -> [b] -> m (Forest a)
unfoldForestM_BF f ts = b [ts] (const id)
  where
    b [] k = pure (k [] [])
    b qs k = foldl (foldr t) b qs [] (\x -> k [] . foldr (uncurry run) id x)

    t a fw bw k = do
        (x,cs) <- f a
        let !n = length cs
        fw (cs : bw) (k . (:) (x, n))

    run x n xs ys =
      case splitAt n ys of
          (cs,zs) -> Node x cs : xs zs

It basically performs the same this as the levels function, but builds the tree back up in the end using the run function. In order to do that, we store the length of each subforest on line 9, so that each node knows how much to take from each level.

A possible optimization is to stop taking the length. Anything in list processing that takes a length screams “wrong” to me (although it’s not always true!) so I often try to find a way to avoid it. The first option would be to keep the cs on line 8 around, and use it as an indicator for the length. That keeps it around longer than strictly necessary, though. The other option is to add a third level: for breadthFirst above, we had one level; for levels, we added another, to indicate the structure of the nodes and their subtrees; here, we can add a third, to maintain that structure when building back up:

unfoldForestM_BF :: Monad m => (b -> m (a, [b])) -> [b] -> m (Forest a)
unfoldForestM_BF f ts = b [ts] (\ls -> concat . ls)
  where
    b [] k = pure (k id [])
    b qs k = foldl g b qs [] (\ls -> k id . ls)

    g a xs qs k = foldr t (\ls ys -> a ys (k . run ls)) xs [] qs

    t a fw xs bw = f a >>= \(x,cs) -> fw (x:xs) (cs:bw)

    run x xs = uncurry (:) . foldl go ((,) [] . xs) x
      where
        go ys y (z:zs) = (Node y z : ys', zs')
          where
            (ys',zs') = ys zs

This unfortunately slows down the code.

Choose a random item from a list in one pass

Donnacha Oisín Kidney — Thu, 15 Mar 2018 00:00:00 UT

Posted on March 15, 2018

Tags: Haskell

Adapted from here.

import System.Random

choose :: (Foldable f, RandomGen g) =>  f a -> g -> (a, g)
choose xs g = h (foldl f (0 :: Integer, error "choose: empty list", g) xs)
  where
    h (_,x,g) = (x,g)
    f (c,y,g) x = case randomR (0,c) g of
        (0,g') -> (c+1,x,g')
        (_,g') -> (c+1,y,g')

Single-Pass Huffman Coding

Donnacha Oisín Kidney — Sat, 17 Feb 2018 00:00:00 UT

Posted on February 17, 2018

Tags: Haskell, Folds

While working on something else, I figured out a nice Haskell implementation of Huffman coding, and I thought I’d share it here. I’ll go through a few techniques for transforming a multi-pass algorithm into a single-pass one first, and then I’ll show how to use them for Huffman. If you just want to skip to the code, it’s provided at the end ¹.

The algorithm isn’t single-pass in the sense of Adaptive Huffman Coding: it still uses the normal Huffman algorithm, but the input is transformed in the same traversal that builds the tree to transform it.

Circular Programming

There are several techniques for turning multi-pass algorithms into single-pass ones in functional languages. Perhaps the most famous is circular programming: using laziness to eliminate a pass. R. S. Bird (1984) used this to great effect in solving the repmin problem:

Given a tree of integers, replace every integer with the minimum integer in the tree, in one pass.

For an imperative programmer, the problem is relatively easy: first, write the code to find the minimum value in the tree in the standard way, using a loop and a “smallest so far” accumulator. Then, inside the loop, after updating the accumulator, set the value of the leaf to be a reference to the accumulator.

At first, that solution may seem necessarily impure: we’re using global, mutable state to update many things at once. However, as the paper shows, we can claw back purity using laziness:

data Tree a = Leaf a | Tree a :*: Tree a

repMin :: Tree Integer -> Tree Integer
repMin xs = ys where
  (m, ys) = go xs
  go (Leaf x) = (x, Leaf m)
  go (xs :*: ys) = (min x y, xs' :*: ys')
    where
      (x,xs') = go xs
      (y,ys') = go ys

There and Back Again

Let’s say we don’t have laziness at our disposal: are we hosed? No ²! Danvy and Goldberg (2005) explore this very issue, by posing the question:

Given two lists, xs and ys, can you zip xs with the reverse of ys in one pass?

The technique used to solve the problem is named “There and Back Again”; it should be clear why from one of the solutions:

convolve xs ys = walk xs const where
  walk [] k = k [] ys
  walk (x:xs) k = walk xs (\r (y:ys) -> k ((x,y) : r) ys)

The traversal of one list builds up the function to consume the other. We could write repmin in the same way:

repMin = uncurry ($) . go where
  go (Leaf x) = (Leaf, x)
  go (xs :*: ys) = (\m -> xs' m :*: ys' m, min xm ym) where
    (xs',xm) = go xs
    (ys',ym) = go ys

Cayley Representations

If you’re doing a lot of appending to some list-like structure, you probably don’t want to use actual lists: you’ll end up traversing the left-hand-side of the append many more times than necessary. A type you can drop in to use instead is difference lists (Hughes 1986):

type DList a = [a] -> [a]

rep :: [a] -> DList a
rep = (++)

abs :: DList a -> [a]
abs xs = xs []

append :: DList a -> DList a -> DList a
append = (.)

append is $\mathcal{O}(1)$ in this representation. In fact, for any monoid with a slow mappend, you can use the same trick: it’s called the Cayley representation, and available as Endo in Data.Monoid.

rep :: Monoid a => a -> Endo a
rep x = Endo (mappend x)

abs :: Monoid a => Endo a -> a
abs (Endo f) = f mempty

instance Monoid (Endo a) where
  mempty = Endo id
  mappend (Endo f) (Endo g) = Enfo (f . g)

You can actually do the same transformation for “monoids” in the categorical sense: applying it to monads, for instance, will give you codensity (Rivas and Jaskelioff 2014).

Traversable

Looking back—just for a second—to the repmin example, we should be able to spot a pattern we can generalize. There’s really nothing tree-specific about it, so why can’t we apply it to lists? Or other structures, for that matter? It turns out we can: the mapAccumL function is tailor-made to this need:

repMin :: Traversable t => t Integer -> t Integer
repMin xs = ys where
  (~(Just m), ys) = mapAccumL f Nothing xs
  f Nothing x = (Just x, m)
  f (Just y) x = (Just (min x y), m)

The tilde before the Just ensures this won’t fail on empty input.

Huffman Coding

Finally, it’s time for the main event. Huffman coding is a very multi-pass algorithm, usually. The steps look like this:

Build a frequency table for each character in the input.
Build a priority queue from that frequency table.
Iteratively pop elements and combine them (into Huffman trees) from the queue until there’s only one left.
That Huffman tree can be used to construct the mapping from items back to their Huffman codes.
Traverse the input again, using the constructed mapping to replace elements with their codes.

We can’t skip any of these steps: we can try perform them all at once, though.

Let’s write the multi-pass version first. We’ll need the frequency table:

frequencies :: Ord a => [a] -> Map a Int
frequencies = Map.fromListWith (+) . map (flip (,) 1)

And a heap, ordered on the frequencies of its elements (I’m using a skew heap here):

data Heap a
  = Nil
  | Node {-# UNPACK #-} !Int a (Heap a) (Heap a)

instance Monoid (Heap a) where
  mappend Nil ys = ys
  mappend xs Nil = xs
  mappend h1@(Node i x lx rx) h2@(Node j y ly ry)
    | i <= j    = Node i x (mappend h2 rx) lx
    | otherwise = Node j y (mappend h1 ry) ly
  mempty = Nil

Next, we need to build the tree³. We can use the tree type from above.

buildTree :: Map a Int -> Maybe (Tree a)
buildTree = prune . toHeap where
  toHeap = Map.foldMapWithKey (\k v -> Node v (Leaf k) Nil Nil)
  prune Nil = Nothing
  prune (Node i x l r) = case mappend l r of
    Nil -> Just x
    Node j y l' r' ->
      prune (mappend (Node (i+j) (x :*: y) Nil Nil) (mappend l' r'))

Then, a way to convert between the tree and a map:

toMapping :: Ord a => Tree a -> Map a [Bool]
toMapping (Leaf x) = Map.singleton x []
toMapping (xs :*: ys) =
    Map.union (fmap (True:) (toMapping xs)) (fmap (False:) (toMapping ys))

And finally, putting the whole thing together:

huffman :: Ord a => [a] -> (Maybe (Tree a), [[Bool]])
huffman xs = (tree, map (mapb Map.!) xs) where
  freq = frequencies xs
  tree = buildTree freq
  mapb = maybe Map.empty toMapping tree

Removing the passes

The first thing to fix is the toMapping function: at every level, it calls union, a complex and expensive operation. However, union and empty form a monoid, so we can use the Cayley representation to reduce the calls to a minimum. Next, we want to get rid of the fmaps: we can do that by assembling a function to perform the fmap as we go, as in convolve⁴.

toMapping :: Ord a => Tree a -> Map a [Bool]
toMapping tree = go tree id Map.empty where
  go (Leaf x) k = Map.insert x (k [])
  go (xs :*: ys) k =
    go xs (k . (:) True) . go ys (k . (:) False)

Secondly, we can integrate the toMapping function with the buildTree function, removing another pass:

buildTree :: Ord a => Map a Int -> Maybe (Tree a, Map a [Bool])
buildTree = prune . toHeap where
  toHeap = Map.foldMapWithKey (\k v -> Node v (Leaf k, leaf k) Nil Nil)
  prune Nil = Nothing
  prune (Node i x l r) = case mappend l r of
    Nil -> Just (fmap (\k -> k id Map.empty) x)
    Node j y l' r' ->
      prune (mappend (Node (i+j) (cmb x y) Nil Nil) (mappend l' r'))
  leaf x k = Map.insert x (k [])
  node xs ys k = xs (k . (:) True) . ys (k . (:) False)
  cmb (xt,xm) (yt,ym) = (xt :*: yt, node xm ym)

Finally, to remove the second pass over the list, we can copy repmin, using mapAccumL to both construct the mapping and apply it to the structure in one go.

huffman :: (Ord a, Traversable t) => t a -> (Maybe (Tree a), t [Bool])
huffman xs = (fmap fst tree, ys) where
  (freq,ys) = mapAccumL f Map.empty xs
  f fm x = (Map.insertWith (+) x 1 fm, mapb Map.! x)
  tree = buildTree freq
  mapb = maybe Map.empty snd tree

And that’s it!

Generalization

The similarity between the repmin function and the solution above is suggestive: is there a way to encode this idea of making a multi-pass algorithm single-pass? Of course! We can use an applicative:

data Circular a b c =
    Circular !a
             (b -> c)

instance Functor (Circular a b) where
    fmap f (Circular tally run) = Circular tally (f . run)

instance Monoid a =>
         Applicative (Circular a b) where
    pure x = Circular mempty (const x)
    Circular fl fr <*> Circular xl xr =
        Circular
            (mappend fl xl)
            (\r -> fr r (xr r))

liftHuffman
    :: Ord a
    => a -> Circular (Map a Int) (Map a [Bool]) [Bool]
liftHuffman x = Circular (Map.singleton x 1) (Map.! x)

runHuffman
    :: Ord a
    => Circular (Map a Int) (Map a [Bool]) r -> (Maybe (Tree a), r)
runHuffman (Circular smry run) =
    maybe (Nothing, run Map.empty) (Just *** run) (buildTree smry)

huffman
    :: (Ord a, Traversable t)
    => t a -> (Maybe (Tree a), t [Bool])
huffman = runHuffman . traverse liftHuffman

Thanks to it being an applicative, you can do all the fun lensy things with it:

showBin :: [Bool] -> String
showBin = map (bool '0' '1')

>>> let liftBin = fmap showBin . liftHuffman
>>> (snd . runHuffman . (each.traverse) liftBin) ("abb", "cad", "c")
(["01","11","11"],["00","01","10"],["00"])

Bringing us back to the start, it can also let us solve repmin!

liftRepMin :: a -> Circular (Option (Min a)) a a
liftRepMin x = Circular (pure (pure x)) id

runRepMin :: Circular (Option (Min a)) a b -> b
runRepMin (Circular m r) = r (case m of
  Option (Just (Min x)) -> x)

repMin :: (Ord a, Traversable t) => t a -> t a
repMin = runRepMin . traverse liftRepMin

So the Circular type is actually just the product of reader and writer, and is closely related to the sort type.

It’s also related to the Prescient type, which I noticed after I’d written the above.

References

Bird, R. S. 1984. “Using Circular Programs to Eliminate Multiple Traversals of Data.” Acta Inf. 21 (3) (October): 239–250. doi:10.1007/BF00264249. http://dx.doi.org/10.1007/BF00264249.

Bird, Richard, Geraint Jones, and Oege De Moor. 1997. “More haste‚ less speed: Lazy versus eager evaluation.” Journal of Functional Programming 7 (5) (September): 541–547. doi:10.1017/S0956796897002827. https://ora.ox.ac.uk/objects/uuid:761a4646-60a2-4622-a1e0-ddea11507d57/datastreams/ATTACHMENT01.

Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” http://brics.dk/RS/05/3/BRICS-RS-05-3.pdf.

Hughes, R. John Muir. 1986. “A Novel Representation of Lists and Its Application to the Function "Reverse".” Information Processing Letters 22 (3) (March): 141–144. doi:10.1016/0020-0190(86)90059-1. http://www.sciencedirect.com/science/article/pii/0020019086900591.

Pippenger, Nicholas. 1997. “Pure Versus Impure Lisp.” ACM Trans. Program. Lang. Syst. 19 (2) (March): 223–238. doi:10.1145/244795.244798. http://doi.acm.org/10.1145/244795.244798.

Rivas, Exequiel, and Mauro Jaskelioff. 2014. “Notions of Computation as Monoids.” arXiv:1406.4823 [cs, math] (May). http://arxiv.org/abs/1406.4823.

Huffman coding single-pass implementation:

import           Data.Map.Strict  (Map)
import qualified Data.Map.Strict  as Map
import           Data.Traversable (mapAccumL)

data Heap a
  = Nil
  | Node {-# UNPACK #-} !Int a (Heap a) (Heap a)

instance Monoid (Heap a) where
  mappend Nil ys = ys
  mappend xs Nil = xs
  mappend h1@(Node i x lx rx) h2@(Node j y ly ry)
    | i <= j    = Node i x (mappend h2 rx) lx
    | otherwise = Node j y (mappend h1 ry) ly
  mempty = Nil

data Tree a = Leaf a | Tree a :*: Tree a

buildTree :: Ord a => Map a Int -> Maybe (Tree a, Map a [Bool])
buildTree = prune . toHeap where
  toHeap = Map.foldMapWithKey (\k v -> Node v (Leaf k, leaf k) Nil Nil)
  prune Nil = Nothing
  prune (Node i x l r) = case mappend l r of
    Nil -> Just (fmap (\k -> k id Map.empty) x)
    Node j y l' r' ->
      prune (mappend (Node (i+j) (cmb x y) Nil Nil) (mappend l' r'))
  leaf x k = Map.insert x (k [])
  node xs ys k = xs (k . (:) True) . ys (k . (:) False)
  cmb (xt,xm) (yt,ym) = (xt :*: yt, node xm ym)

huffman :: (Ord a, Traversable t) => t a -> (Maybe (Tree a), t [Bool])
huffman xs = (fmap fst tree, ys) where
  (freq,ys) = mapAccumL f Map.empty xs
  f fm x = (Map.insertWith (+) x 1 fm, mapb Map.! x)
  tree = buildTree freq
  mapb = maybe Map.empty snd tree

↩︎

Well, that’s a little bit of a lie. In terms of asympostics, Pippenger (1997) stated a problem that could be solved in linear time in impure Lisp, but $\Omega(n \log n)$ in pure Lisp. R. Bird, Jones, and Moor (1997) then produced an algorithm that could solve the problem in linear time, by using laziness. So, in some cases, laziness will give you asymptotics you can’t get without it (if you want to stay pure).↩︎
There’s actually a nicer version of the buildTree function which uses StateT (Heap a) Maybe, but it’s equivalent to this one under the hood, and I though might be a little distracting.↩︎
Something to notice about this function is that it’s going top-down and bottom-up at the same time. Combining the maps (with (.)) is done bottom-up, but building the codes is top-down. This means the codes are built in reverse order! That’s why the accumulating parameter (k) is a difference list, rather than a normal list. As it happens, if normal lists were used, the function would be slightly more efficient through sharing, but the codes would all be reversed.↩︎

Monadic List Functions

Donnacha Oisín Kidney — Sun, 11 Feb 2018 00:00:00 UT

Posted on February 11, 2018

Tags: Haskell, Applicative

Here’s an old Haskell chestnut:

>>> filterM (\_ -> [False, True]) [1,2,3]
[[],[3],[2],[2,3],[1],[1,3],[1,2],[1,2,3]]

filterM (\_ -> [False,True]) gives the power set of some input list. It’s one of the especially magical demonstrations of monads. From a high-level perspective, it makes sense: for each element in the list, we want it to be present in one output, and not present in another. It’s hard to see how it actually works, though. The (old¹) source for filterM doesn’t help hugely, either:

filterM          :: (Monad m) => (a -> m Bool) -> [a] -> m [a]
filterM _ []     =  return []
filterM p (x:xs) =  do
   flg <- p x
   ys  <- filterM p xs
   return (if flg then x:ys else ys)

Again, elegant and beautiful (aside from the three-space indent), but opaque. Despite not really getting how it works, I was encouraged by its simplicity to try my hand at some of the other functions from Data.List.

Grouping

Let’s start with the subject of my last post. Here’s the implementation:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p xs = build (\c n ->
  let f x a q
        | q x = (x : ys, zs)
        | otherwise = ([], c (x : ys) zs)
        where (ys,zs) = a (p x)
  in snd (foldr f (const ([], n)) xs (const False)))

It translates over pretty readily:

groupByM :: Applicative m => (a -> a -> m Bool) -> [a] -> m [[a]]
groupByM p xs =
  fmap snd (foldr f (const (pure ([], []))) xs (const (pure (False))))
  where
    f x a q = liftA2 st (q x) (a (p x)) where
      st b (ys,zs)
        | b = (x : ys, zs)
        | otherwise = ([], (x:ys):zs)

Let’s try it with a similar example to filterM:

>>> groupByM (\_ _ -> [False, True]) [1,2,3]
[[[1],[2],[3]],[[1],[2,3]],[[1,2],[3]],[[1,2,3]]]

It gives the partitions of the list!

Sorting

So these monadic generalisations have been discovered before, several times over. There’s even a package with monadic versions of the functions in Data.List. Exploring this idea with a little more formality is the paper “All Sorts of Permutations” (Christiansen, Danilenko, and Dylus 2016), and accompanying presentation on YouTube. They show that the monadic version of sort produces permutations of the input list, and examine the output from different sorting algorithms. Here’s a couple of their implementations, altered slightly:

insertM :: Monad m => (a -> a -> m Bool) -> a -> [a] -> m [a]
insertM _ x [] = pure [x]
insertM p x yys@(y:ys) = do
  lte <- p x y
  if lte
    then pure (x:yys)
    else fmap (y:) (insertM p x ys)

insertSortM :: Monad m => (a -> a -> m Bool) -> [a] -> m [a]
insertSortM p = foldrM (insertM p) []

partitionM :: Applicative m => (a -> m Bool) -> [a] -> m ([a],[a])
partitionM p = foldr f (pure ([],[])) where
  f x = liftA2 ifStmt (p x) where
    ifStmt flg (tr,fl)
      | flg = (x:tr,fl)
      | otherwise = (tr,x:fl)
      
quickSortM :: Monad m => (a -> a -> m Bool) -> [a] -> m [a]
quickSortM p [] = pure []
quickSortM p (x:xs) = do
  (gt,le) <- partitionM (p x) xs
  ls <- quickSortM p le
  gs <- quickSortM p gt
  pure (ls ++ [x] ++ gs)

>>> insertSortM (\_ _ -> [False,True]) [1,2,3]
[[1,2,3],[1,3,2],[3,1,2],[2,1,3],[2,3,1],[3,2,1]]

>>> quickSortM (\_ _ -> [False,True]) [1,2,3]
[[3,2,1],[2,3,1],[2,1,3],[3,1,2],[1,3,2],[1,2,3]]

As it should be easy to see, they’re very concise and elegant, and strongly resemble the pure versions of the algorithms.

State

So the examples above are very interesting and cool, but they don’t necessarily have a place in real Haskell code. If you wanted to find the permutations, partitions, or power set of a list you’d probably use a more standard implementation. That’s not to say that these monadic functions have no uses, though: especially when coupled with State they yield readable and fast implementations for certain tricky functions. ordNub, for instance:

ordNub :: Ord a => [a] -> [a]
ordNub =
  flip evalState Set.empty .
  filterM
    (\x -> do
       flg <- gets (Set.notMember x)
       when flg (modify (Set.insert x))
       pure flg)

Alternatively, using a monadic version of maximumOn:

maximumOnM :: (Applicative m, Ord b) => (a -> m b) -> [a] -> m (Maybe a)
maximumOnM p = (fmap . fmap) snd . foldl f (pure Nothing)
  where
    f a e = liftA2 g a (p e)
      where
        g Nothing q = Just (q, e)
        g b@(Just (o, y)) q
          | o < q = Just (q, e)
          | otherwise = b

You can write a one-pass mostFrequent:

mostFrequent :: Ord a => [a] -> Maybe a
mostFrequent =
  flip evalState Map.empty .
  maximumOnM
    (\x -> maybe 1 succ <$> state (Map.insertLookupWithKey (const (+)) x 1))

Decision Trees

One of the nicest things about the paper was the diagrams of decision trees provided for each sorting algorithm. I couldn’t find a library to do that for me, so I had a go at producing my own. First, we’ll need a data type to represent the tree itself:

data DecTree t a
  = Pure a
  | Choice t (DecTree t a) (DecTree t a)
  deriving Functor

We’ll say the left branch is “true” and the right “false”. Applicative and monad instances are relatively mechanical²:

instance Applicative (DecTree t) where
  pure = Pure
  Pure f <*> xs = fmap f xs
  Choice c ls rs <*> xs = Choice c (ls <*> xs) (rs <*> xs)
  
instance Monad (DecTree t) where
  Pure x >>= f = f x
  Choice c ls rs >>= f = Choice c (ls >>= f) (rs >>= f)

We can now create a comparator function that constructs one of these trees, and remembers the values it was given:

traceCompare :: a -> a -> DecTree (a,a) Bool
traceCompare x y = Choice (x,y) (Pure True) (Pure False)

Finally, to draw the tree, I’ll use a function from my binary tree library:

printDecTree :: (Show a, Show b) => String -> DecTree (a,a) b -> IO ()
printDecTree rel t = putStr (drawTreeWith id (go t) "") where
  go (Pure xs) = Node (show xs) Leaf Leaf
  go (Choice (x,y) tr fl) =
    Node (show x ++ rel ++ show y) (go tr) (go fl)

And we get these really nice diagrams out:

>>> (printDecTree "<=" . insertSortM traceCompare) [1,2,3]

         ┌[1,2,3]
    ┌1<=2┤
    │    │    ┌[2,1,3]
    │    └1<=3┤
    │         └[2,3,1]
2<=3┤
    │    ┌[1,3,2]
    └1<=3┤
         │    ┌[3,1,2]
         └1<=2┤
              └[3,2,1]

>>> (printDecTree "<=" . quickSortM traceCompare) [1,2,3]

              ┌[1,2,3]
         ┌2<=3┤
         │    └[1,3,2]
    ┌1<=3┤
    │    └[3,1,2]
1<=2┤
    │    ┌[2,1,3]
    └1<=3┤
         │    ┌[2,3,1]
         └2<=3┤
              └[3,2,1]

We can also try it out with the other monadic list functions:

>>> (printDecTree "=" . groupByM traceCompare) [1,2,3]

       ┌[[1,2,3]]
   ┌2=3┤
   │   └[[1,2],[3]]
1=2┤
   │   ┌[[1],[2,3]]
   └2=3┤
       └[[1],[2],[3]]

Applicative

You might notice that none of these “monadic” functions actually require a monad constraint: they’re all applicative. There’s a straightforward implementation that relies only on applicative for most of these functions, with a notable exception: sort. Getting that to work with just applicative is the subject of a future post.

References

Christiansen, Jan, Nikita Danilenko, and Sandra Dylus. 2016. “All Sorts of Permutations (Functional Pearl).” In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, 168–179. ICFP 2016. New York, NY, USA: ACM. doi:10.1145/2951913.2951949. http://informatik.uni-kiel.de/~sad/icfp2016-preprint.pdf.

The definition has since been updated to more modern Haskell: it now uses a fold, and only requires Applicative.↩︎
Part of the reason the instances are so mechanical is that this type strongly resembles the free monad:

data Free f a = Pure a | Free (f (Free f a))

In fact, the example given in the MonadFree class is the following:

data Pair a = Pair a a

type Tree = Free Pair

The only difference with the above type and the decision tree is that the decision tree carries a tag with it.

So what’s so interesting about this relationship? Well, Pair is actually a representable functor. Any representable functor f a can be converted to (and from) a function key -> a, where key is the specific key for f. The key for Pair is Bool: the result of the function we passed in to the sorting functions!

In general, you can make a “decision tree” for any function of type a -> b like so:

type DecTree a b r = Rep f ~ b => Free (Compose ((,) a) f) r

But more on that in a later post.↩︎

groupBy

Donnacha Oisín Kidney — Sun, 07 Jan 2018 00:00:00 UT

Posted on January 7, 2018

Tags: Haskell, Folds

Here’s a useful function from Data.List:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]

groupBy (==) "aabcdda"
-- ["aa","b","c","dd","a"]

However, as has been pointed out before¹, groupBy expects an equivalence relation, and can exhibit surprising behavior when it doesn’t get one. Let’s say, for instance, that we wanted to group numbers that were close together:

groupClose :: [Integer] -> [[Integer]]
groupClose = groupBy (\x y -> abs (x - y) < 3)

What would you expect on the list [1, 2, 3, 4, 5]? All in the same group? Well, what you actually get is:

[[1,2,3],[4,5]]

This is because the implementation of groupBy only compares to the first element in each group:

groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

Brandon Simmons gave a definition of groupBy that is perhaps more useful, but it used explicit recursion, rather than a fold.

A definition with foldr turned out to be trickier than I expected. I found some of the laziness properties especially difficult:

>>> head (groupBy (==) (1:2:undefined))
[1]
>>> (head . head) (groupBy (==) (1:undefined))
1
>>> (head . head . tail) (groupBy (==) (1:2:undefined))
2

Here’s the definition I came up with, after some deliberation:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p xs = build (\c n ->
  let f x a q
        | q x = (x : ys, zs)
        | otherwise = ([], c (x : ys) zs)
        where (ys,zs) = a (p x)
  in snd (foldr f (const ([], n)) xs (const False)))
{-# INLINE groupBy #-}

Seemingly benign changes to the function will break one or more of the above tests. In particular, the laziness of a “where” binding needs to be taken into account. Here’s an early attempt which failed:

groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy p xs = build (\c n -> 
  let f x a q d
        | q x = a (p x) (d . (:) x)
        | otherwise = d [] (a (p x) (c . (:) x))
  in foldr f (\_ d -> d [] n) xs (const False) (\ _ y -> y))

Once done, though, it works as expected:

>>> groupBy (==) "aaabcccdda"
["aaa","b","ccc","dd","a"]
>>> groupBy (==) []
[]
>>> groupBy (<=) [1,2,2,3,1,2,0,4,5,2]
[[1,2,2,3],[1,2],[0,4,5],[2]]

It’s the fastest version I could find that obeyed the above laziness properties.

The GHC page on the issue unfortunately seems to indicate the implementation won’t be changed. Ah, well. Regardless, I have a repository with the implementation above (with extra fusion machinery added) and comparisons to other implementations.

There are several threads on the libraries mailing list on this topic:

2006

mapAccumL - find max in-sequence subsequence

2007

Data.List.groupBy with non-transitive equality predicate (this is the longest discussion on the topic)

2008

Generalize groupBy in a useful way?

2009

nubBy seems broken in recent GHCs

↩︎

Unfoldl

Donnacha Oisín Kidney — Thu, 14 Dec 2017 00:00:00 UT

Posted on December 14, 2017

Tags: Haskell

{-# LANGUAGE LambdaCase #-}

module Unfoldl where

import GHC.Base (build)
import Data.Tuple (swap)

unfoldl :: (b -> Maybe (a, b)) -> b -> [a]
unfoldl f b =
    build
        (\c n ->
              let r a = maybe a (uncurry (r . (`c` a))) . f
              in r n b)

-- | >>> toDigs 10 123
-- [1,2,3]
toDigs :: (Integral a, Num a) => a -> a -> [a]
toDigs base =
  unfoldl (\case
    0 -> Nothing
    n -> Just (swap (n `quotRem` base)))

Balancing Folds

Donnacha Oisín Kidney — Mon, 30 Oct 2017 00:00:00 UT

Posted on October 30, 2017

Part 1 of a 3-part series on Balanced Folds

Tags: Haskell, Folds

There are three main ways to fold things in Haskell: from the right, from the left, and from either side. Let’s look at the left vs right variants first. foldr works from the right:

foldr (+) 0 [1,2,3]
1 + (2 + (3 + 0))

And foldl from the left:

foldl (+) 0 [1,2,3]
((0 + 1) + 2) + 3

As you’ll notice, the result of the two operations above is the same (6; although one may take much longer than the other). In fact, whenever the result of foldr and foldl is the same for a pair of arguments (in this case + and 0), we say that that pair forms a Monoid for some type (well, there’s some extra stuff to do with 0, but I only care about associativity at the moment). In this case, the Sum monoid is formed:

newtype Sum a = Sum { getSum :: a }

instance Num a => Monoid (Sum a) where
  mempty = Sum 0
  mappend (Sum x) (Sum y) = Sum (x + y)

When you know that you have a monoid, you can use the foldMap function: this is the third kind of fold. It says that you don’t care which of foldl or foldr is used, so the implementer of foldMap can put the parentheses wherever they want:

foldMap Sum [1,2,3]
(1 + 2) + (3 + 0)
0 + ((1 + 2) + 3)
((0 + 1) + 2) + 3

And we can’t tell the difference from the result. This is a pretty bare-bones introduction to folds and monoids: you won’t need to know more than that for the rest of this post, but the topic area is fascinating and deep, so don’t let me give you the impression that I’ve done anything more than scratched the surface.

Other Ways to Fold

Quite often, we do care about where the parentheses go. Take, for instance, a binary tree type, with values at the leaves:

data Tree a
  = Empty
  | Leaf a
  | Tree a :*: Tree a

instance Show a =>
         Show (Tree a) where
    show Empty = "()"
    show (Leaf x) = show x
    show (l :*: r) = "(" ++ show l ++ "*" ++ show r ++ ")"

We can’t (well, shouldn’t) us foldMap here, because we would be able to tell the difference between different arrangements of parentheses:

>>> foldMap something [1,2,3]

((1*2)*(3*())) │ (()*((1*2)*3)) │ (((()*1)*2)*3)
───────────────┼────────────────┼───────────────
       ┌1      │      ┌()       │       ┌()
      ┌┤       │      ┤         │      ┌┤
      │└2      │      │ ┌1      │      │└1
      ┤        │      │┌┤       │     ┌┤
      │┌3      │      ││└2      │     │└2
      └┤       │      └┤        │     ┤
       └()     │       └3       │     └3

So we use one of the folds which lets us choose the arrangements of parentheses:

>>> (foldr (:*:) Empty . map Leaf) [1,2,3,4,5,6]
(1*(2*(3*(4*(5*(6*()))))))
     ┌1
    ┌┤
    │└2
   ┌┤
   │└3
  ┌┤
  │└4
 ┌┤
 │└5
┌┤
│└6
┤
└()

>>> (foldl (:*:) Empty . map Leaf) [1,2,3,4,5,6]
((((((()*1)*2)*3)*4)*5)*6)
┌()
┤
│┌1
└┤
 │┌2
 └┤
  │┌3
  └┤
   │┌4
   └┤
    │┌5
    └┤
     └6

The issue is that neither of the trees generated are necessarily what we want: often, we want something more balanced.

TreeFold

To try and find a more balanced fold, let’s (for now) assume we’re always going to get non-empty input. This will let us simplify the Tree type a little, to:

data Tree a
  = Leaf a
  | Tree a :*: Tree a
  deriving Foldable

instance Show a =>
         Show (Tree a) where
    show (Leaf x) = show x
    show (l :*: r) = "(" ++ show l ++ "*" ++ show r ++ ")"

Then, we can use Jon Fairbairn’s fold described in this email, adapted a bit for our non-empty input:

import Data.List.NonEmpty (NonEmpty(..))

treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f = go
  where
    go (x :| []) = x
    go (a :| b:l) = go (f a b :| pairMap l)
    pairMap (x:y:rest) = f x y : pairMap rest
    pairMap xs = xs

There are two parts to this function: pairMap and the go helper. pairMap combines adjacent elements in the list using the combining function. As a top-level function it might look like this:

pairMap f (x:y:rest) = f x y : pairMap f rest
pairMap f xs = xs

pairMap (++) ["a","b","c","d","e"]
-- ["ab","cd","e"]

As you can see, it leaves any leftovers untouched at the end of the list.

The go helper applies pairMap repeatedly to the list until it has only one element. This gives us much more balanced results that foldl or foldr (turn on -XOverloadedLists to write non-empty lists using this syntax):

>>> (treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6]
(((1*2)*(3*4))*(5*6))
  ┌1
 ┌┤
 │└2
┌┤
││┌3
│└┤
│ └4
┤
│┌5
└┤
 └6

>>> (treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6,7,8]
(((1*2)*(3*4))*((5*6)*(7*8)))
  ┌1
 ┌┤
 │└2
┌┤
││┌3
│└┤
│ └4
┤
│ ┌5
│┌┤
││└6
└┤
 │┌7
 └┤
  └8

However, there are still cases where one branch will be much larger than its sibling. The fold fills a balanced binary tree from the left, but any leftover elements are put at the top level. In other words:

>>> (treeFold (:*:) . fmap Leaf) [1..9]
((((1*2)*(3*4))*((5*6)*(7*8)))*9)
   ┌1
  ┌┤
  │└2
 ┌┤
 ││┌3
 │└┤
 │ └4
┌┤
││ ┌5
││┌┤
│││└6
│└┤
│ │┌7
│ └┤
│  └8
┤
└9

That 9 hanging out on its own there is a problem.

Typewriters and Slaloms

One observation we can make is that pairMap always starts from the same side on each iteration, like a typewriter moving from one line to the next. This has the consequence of building up the leftovers on one side, leaving them until the top level.

We can improve the situation slightly by going back and forth, slalom-style, so we consume leftovers on each iteration:

treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f = goTo where
  
  goTo (y :| []) = y
  goTo (a :| b : rest) = goFro (pairMap f (f a b) rest)
  goFro (y :| []) = y
  goFro (a :| b : rest) = goTo (pairMap (flip f) (f b a) rest)

  pairMap f = go [] where
    go ys y (a:b:rest) = go (y:ys) (f a b) rest
    go ys y [z] = z :| y : ys
    go ys y [] = y :| ys

Notice that we have to flip the combining function to make sure the ordering is the same on output. For the earlier example, this solves the issue:

>>> (treeFold (:*:) . fmap Leaf) [1..9]
(((1*2)*((3*4)*(5*6)))*((7*8)*9))
  ┌1
 ┌┤
 │└2
┌┤
││ ┌3
││┌┤
│││└4
│└┤
│ │┌5
│ └┤
│  └6
┤
│ ┌7
│┌┤
││└8
└┤
 └9

It does not build up the tree as balanced as it possibly could, though:

>>> (treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6]
((1*2)*((3*4)*(5*6)))
 ┌1
┌┤
│└2
┤
│ ┌3
│┌┤
││└4
└┤
 │┌5
 └┤
  └6

There’s four elements in the right branch, and two in the left in the above example. Three in each would be optimal.

Wait—optimal in what sense, exactly? What do we mean when we say one tree is more balanced than another? Let’s say the “balance factor” is the largest difference in size of two sibling trees:

balFac :: Tree a -> Integer
balFac = fst . go where
  go :: Tree a -> (Integer, Integer)
  go (Leaf _) = (0, 1)
  go (l :*: r) = (lb `max` rb `max` abs (rs - ls), rs + ls) where
    (lb,ls) = go l
    (rb,rs) = go r

And one tree is more balanced than another if it has a smaller balance factor.

There’s effectively no limit on the balance factor for the typewriter method: when the input is one larger than a power of two, it’ll stick the one extra in one branch and the rest in another (as with [1..9] in the example above).

For the slalom method, it looks like there’s something more interesting going on, limit-wise. I haven’t been able to verify this formally (yet), but from what I can tell, a tree of height $n$ will have at most a balance factor of the $n$ th Jacobsthal number. That’s (apparently) also the number of ways to tie a tie using $n + 2$ turns.

That was just gathered from some quick experiments and oeis.org, but it seems to make sense intuitively. Jacobsthal numbers are defined like this:

j 0 = 0
j 1 = 1
j n = j (n-1) + 2 * j (n-2)

So, at the top level, there’s the imbalance caused by the second-last pairFold, plus the imbalance caused by the third-to-last. However, the third-to-last imbalance is twice what it was at that level, because it is now working with an already-paired-up list. Why isn’t the second last imbalance also doubled? Because it’s counteracted by the fact that we turned around: the imbalance is in an element that’s a leftover element. At least that’s what my intuition is at this point.

The minimum balance factor is, of course, one. Unfortunately, to achieve that, I lost some of the properties of the previous folds:

Lengths

Up until now, I have been avoiding taking the length of the incoming list. It would lose a lot of laziness, cause an extra traversal, and generally seems like an ugly solution. Nonetheless, it gives the most balanced results I could find so far:

treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f (x:|xs) = go (length (x:xs)) (x:xs) where
  go 1 [y] = y
  go n ys = f (go m a) (go (n-m) b) where
    (a,b) = splitAt m ys 
    m = n `div` 2

splitAt is an inefficient operation, but if we let the left-hand call return its unused input from the list, we can avoid it:

treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f (x:|xs) = fst (go (length (x:xs)) (x:xs)) where
  go 1 (y:ys) = (y,ys)
  go n ys = (f l r, rs) where
    (l,ls) = go m ys
    (r,rs) = go (n-m) ls
    m = n `div` 2

Finally, you may have spotted the state monad in this last version. We can make the similarity explicit:

treeFold :: (a -> a -> a) -> NonEmpty a -> a
treeFold f (x:|xs) = evalState (go (length (x:xs))) (x:xs) where
  go 1 = state (\(y:ys) -> (y,ys))
  go n = do
    let m = n `div` 2
    l <- go m
    r <- go (n-m)
    return (f l r)

And there you have it: three different ways to fold in a more balanced way. Perhaps surprisingly, the first is the fastest in my tests. I’d love to hear if there’s a more balanced version (which is lazy, ideally) that is just as efficient as the first implementation.

Stable Summation

I have found two other uses for these folds other than simply constructing more balanced binary trees. The first is summation of floating-point numbers. If you sum floating-point numbers in the usual way with foldl' (or, indeed, with an accumulator in an imperative language), you will see an error growth of $\mathcal{O}(n)$ , where $n$ is the number of floats you’re summing.

A well-known solution to this problem is the Kahan summation algorithm. It carries with it a running compensation for accumulating errors, giving it $\mathcal{O}(1)$ error growth. There are two downsides to the algorithm: it takes four times the number of numerical operations to perform, and isn’t parallel.

For that reason, it’s often not used in practice: instead, floats are summed pairwise, in a manner often referred to as cascade summation. This is what’s used in NumPy. The error growth isn’t quite as good— $\mathcal{O}(\log{n})$ —but it takes the exact same number of operations as normal summation. On top of that:

Parallelization

Dividing a fold into roughly-equal chunks is exactly the kind of problem encountered when trying to parallelize certain algorithms. Adapting the folds above so that their work is performed in parallel is surprisingly easy:

splitPar :: (a -> a -> a) -> (Int -> a) -> (Int -> a) -> Int -> a
splitPar f = go
  where
    go l r 0 = f (l 0) (r 0)
    go l r n = lt `par` (rt `pseq` f lt rt)
      where
        lt = l (n-m)
        rt = r m
        m = n `div` 2

treeFoldParallel :: (a -> a -> a) -> NonEmpty a -> a
treeFoldParallel f xs =
    treeFold const (splitPar f) xs numCapabilities

The above will split the fold into numCapabilities chunks, and perform each one in parallel. numCapabilities is a constant defined in GHC.Conc: it’s the number of threads which can be run simultaneously at any one time. Alternatively, you could the function include a parameter for how many chunks to split the computation into. You could also have the fold adapt as it went, choosing whether or not to spark based on how many sparks exist at any given time:

parseq :: a -> b -> b
parseq a b =
    runST
        (bool (par a b) (seq a b) <$>
         unsafeIOToST (liftA2 (>) numSparks getNumCapabilities))

treeFoldAdaptive :: (a -> a -> a) -> a -> [a] -> a
treeFoldAdaptive f =
    Lazy.treeFold
        (\l r ->
              r `parseq` (l `parseq` f l r))

Adapted from this comment by Edward Kmett. This is actually the fastest version of all the folds.

All of this is provided in a library I’ve put up on Hackage.

Convolutions and Semirings

Donnacha Oisín Kidney — Fri, 13 Oct 2017 00:00:00 UT

Posted on October 13, 2017

Tags: Haskell, Semirings

I have been working a little more on my semirings library recently, and I have come across some interesting functions in the process. First, a quick recap on the Semiring class and some related functions:

class Semiring a where
  one :: a
  zero :: a
  infixl 6 <+>
  (<+>) :: a -> a -> a
  infixl 7 <.>
  (<.>) :: a -> a -> a

add :: (Foldable f, Semiring a) => f a -> a
add = foldl' (<+>) zero

mul :: (Foldable f, Semiring a) => f a -> a
mul = foldl' (<.>) one

instance Semiring Integer where
  one = 1
  zero = 0
  (<+>) = (+)
  (<.>) = (*)

instance Semiring Bool where
  one = True
  zero = False
  (<+>) = (||)
  (<.>) = (&&)

You can think of it as a replacement for Num, but it turns out to be much more generally useful than that.

Matrix Multiplication

The first interesting function is to do with matrix multiplication. Here’s the code for multiplying two matrices represented as nested lists:

mulMatrix :: Semiring a => [[a]] -> [[a]] -> [[a]]
mulMatrix xs ys = map (\row -> map (add . zipWith (<.>) row) cs) xs
  where
    cs = transpose ys

One of the issues with this code (other than its woeful performance) is that it seems needlessly list-specific. zipWith seems like the kind of thing that exists on a bunch of different structures. Indeed, the ZipList wrapper uses zipWith as its <*> implementation. Let’s try that for now:

mulMatrix :: (Semiring a, Applicative f) => f (f a) -> f (f a) -> f (f a)
mulMatrix xs ys = fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
  where
    cs = transpose ys

Of course, now add needs to work on our f, so it should be Foldable

mulMatrix 
  :: (Semiring a, Applicative f, Foldable f) 
  => f (f a) -> f (f a) -> f (f a)
mulMatrix = ...

transpose is the missing piece now. A little bit of Applicative magic can help us out again, though: sequenceA is transpose on ZipLists (McBride and Paterson 2008).

mulMatrix 
  :: (Semiring a, Applicative f, Traversable f) 
  => f (f a) -> f (f a) -> f (f a)
mulMatrix xs ys = 
    fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
  where
    cs = sequenceA ys

One further generalization: The two fs don’t actually need to be the same:

mulMatrix
    :: (Applicative n
       ,Traversable m
       ,Applicative m
       ,Applicative p
       ,Semiring a)
    => n (m a) -> m (p a) -> n (p a)
mulMatrix xs ys = fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
  where
    cs = sequenceA ys

Happily, the way that the wrappers (n, m, and p) match up coincides precisely with how matrix dimensions match up in matrix multiplication. Quoting from the Wikipedia definition:

if $A$ is an $n \times m$ matrix and $B$ is an $m \times p$ matrix, their matrix product $AB$ is an $n \times p$ matrix

This function is present in the linear package with some different constraints. In fairness, Applicative probably isn’t the best thing to use here since it doesn’t work for so many instances (MonadZip or something similar may be more suitable), but it’s very handy to have, and works out-of the box for types like:

data Three a 
  = Three a a a 
  deriving (Functor, Foldable, Traversable, Eq, Ord, Show)

instance Applicative Three where
  pure x = Three x x x
  Three fx fy fz <*> Three xx xy xz = Three (fx xx) (fy xy) (fz xz)

Which makes it (to my mind) useful enough to keep. Also, it hugely simplified the code for matrix multiplication in square matrices I had, from Okasaki (1999).

Convolutions

If you’re putting a general class in a library that you want people to use, and there exist sensible instances for common Haskell types, you should probably provide those instances in the library to avoid orphans. The meaning of “sensible” here is vague: generally speaking, if there is only one obvious or clear instance, then it’s sensible. For a list instance for the semiring class, for instance, I could figure out several law-abiding definitions for <+>, one and zero, but only one for <.>: polynomial multiplication. You know, where you multiply two polynomials like so:

$(x^3 + 2x + 3)(5x + 3x^2 + 4) = 9x^5 + 15x^4 + 18x^3 + 28x^2 + 38x + 24$

A more general definition looks something like this:

$(a_0x^0 + a_1x^1 + a_2x^2)(b_0x^0 + b_1x^1 + b_2x^2) =$ $a_0b_0x^0 + (a_0b_1 + a_1b_0)x^1 + (a_0b_2 + a_1b_1 + a_2b_0)x^2 + (a_1b_2 + a_2b_1)x^3 + a_2b_2x^4$

Or, fully generalized:

$c_k = a_0b_k + a_1b_{k-1} + \ldots + a_{k-1}b_1 + a_kb_0$ $f(x) \times g(x) = \sum_{i=0}^{n+m}c_ix^i$

So it turns out that you can represent polynomials pretty elegantly as lists. Take an example from above:

$x^3 + 2x + 3$

And rearrange it in order of the powers of $x$ :

$3x^0 + 2x^1 + x^3$

And fill in missing coefficients:

$3x^0 + 2x^1 + 0x^2 + 1x^3$

And then the list representation of that polynomial is the list of those coefficients:

[3, 2, 0, 1]

For me, the definitions of multiplication above were pretty hard to understand. In Haskell, however, the definition is quite beautiful:

instance Semiring a => Semiring [a] where
  one = [one]
  zero = []
  [] <+> ys = ys
  xs <+> [] = xs
  (x:xs) <+> (y:ys) = x <+> y : (xs <+> ys)
  _ <.> [] = []
  [] <.> _ = []
  (x:xs) <.> (y:ys) = (x<.>y) : map (x<.>) ys <+> xs <.> (y:ys)

This definition for <.> can be found on page 4 of McIlroy (1999). Although there was a version of the paper with a slightly different definition:

_ <.> [] = []
[] <.> _ = []
(x:xs) <.> (y:ys) 
  = (x<.>y) : (map (x<.>) ys <+> map (<.>y) xs <+> (zero : (xs <.> ys)))

Similar to one which appeared in Dolan (2013).

As it happens, I prefer the first definition. It’s shorter, and I figured out how to write it as a fold:

_ <.> [] = []
xs <.> ys = foldr f [] xs where
  f x zs = map (x <.>) ys <+> (zero : zs)

And if you inline the <+>, you get a reasonable speedup:

xs <.> ys = foldr f [] xs
  where
    f x zs = foldr (g x) id ys (zero : zs)
    g x y a (z:zs) = x <.> y <+> z : a zs
    g x y a [] = x <.> y : a []

The definition of <+> can also use a fold on either side for fusion purposes:

(<+>) = foldr f id where
  f x xs (y:ys) = x <+> y : xs ys
  f x xs [] = x : xs []

(<+>) = flip (foldr f id) where
  f y ys (x:xs) = x <+> y : ys xs
  f y ys [] = y : ys []

There are rules in the library to choose one of the above definitions if fusion is available.

This definition is much more widely useful than it may seem at first. Say, for instance, you wanted to search through pairs of things from two infinite lists. You can’t use the normal way to pair things for lists, the Cartesian product, because it will diverge:

[(x,y) | x <- [1..], y <- [1..]]
-- [(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10)...

You’ll never get beyond 1 in the first list. Zipping isn’t an option either, because you won’t really explore the search space, only corresponding pairs. Brent Yorgey showed that if you want a list like this:

[(y,x-y) | x <- [0..], y <- [0..x] ]
-- [(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)...

Then what you’re looking for is a convolution (the same thing as polynomial multiplication). <.> above can be adapted readily:

convolve :: [a] -> [b] -> [[(a,b)]]
convolve xs ys = foldr f [] xs
  where
    f x zs = foldr (g x) id ys ([] : zs)
    g x y a (z:zs) = ((x, y) : z) : a zs
    g x y a [] = [(x, y)] : a []

Flatten out this result to get your ordering. This convolution is a little different from the one in the blog post. By inlining <+> we can avoid the expensive ++ function, without using difference lists.

Long Multiplication

Here’s another cool use of lists as polynomials: they can be used as a positional numeral system. Most common numeral systems are positional, including Arabic (the system you most likely use, where twenty-four is written as 24) and binary. Non-positional systems are things like Roman numerals. Looking at the Arabic system for now, we see that the way of writing down numbers:

$1989$

Can be thought of the sum of each digit multiplied by ten to the power of its position:

$1989 = 1 \times 10^3 \plus 9 \times 10^2 \plus 8 \times 10^1 \plus 9 \times 10^0$ $1989 = 1 \times 1000 \plus 9 \times 100 \plus 8 \times 10 \plus 9 \times 1$ $1989 = 1000 \plus 900 \plus 80 \plus 9$ $1989 = 1989$

Where the positions are numbered from the right. In other words, it’s our polynomial list from above in reverse. As well as that, the convolution is long multiplication.

Now, taking this straight off we can try some examples:

-- 12 + 15 = 27
[2, 1] <+> [5, 1] == [7, 2]

-- 23 * 2 = 46
[3, 2] <.> [2] == [6, 4]

The issue, of course, is that we’re not handling carrying properly:

[6] <+> [6] == [12]

No matter: we can perform all the carries after the addition, and everything works out fine:

carry
    :: Integral a
    => a -> [a] -> [a]
carry base xs = foldr f (toBase base) xs 0
  where
    f e a cin = r : a q where
      (q,r) = quotRem (cin + e) base
        
toBase :: Integral a => a -> a -> [a]
toBase base = unfoldr f where
  f 0 = Nothing
  f n = Just (swap (quotRem n base))

Wrap the whole thing in a newtype and we can have a Num instance:

newtype Positional 
  = Positional 
  { withBase :: Integer -> [Integer] 
  } 

instance Num Positional where
  Positional x + Positional y = Positional (carry <*> x <+> y)
  Positional x * Positional y = Positional (carry <*> x <.> y)
  fromInteger m = Positional (\base -> toBase base m)
  abs = id
  signum = id
  negate = id
  
toDigits :: Integer -> Positional -> [Integer]
toDigits base p = reverse (withBase p base)

This also lets us choose our base after the fact:

sumHundred = (sum . map fromInteger) [1..100]
toDigits 10 sumHundred
-- [5,0,5,0]
toDigits 2 sumHundred
-- [1,0,0,1,1,1,0,1,1,1,0,1,0]

Vectors

All the hand-optimizing, inlining, and fusion magic in the world won’t make a list-based implementation of convolution faster than a proper one on vectors, unfortunately. In particular, for larger vectors, a fast Fourier transform can be used. Also, usually code like this will be parallelized, rather than sequential. That said, it can be helpful to implement the slower version on vectors, in the usual indexed way, for comparison’s sake:

instance Semiring a =>
         Semiring (Vector a) where
    one = Vector.singleton one
    zero = Vector.empty
    xs <+> ys =
        case compare (Vector.length xs) (Vector.length ys) of
            EQ -> Vector.zipWith (<+>) xs ys
            LT -> Vector.unsafeAccumulate (<+>) ys (Vector.indexed xs)
            GT -> Vector.unsafeAccumulate (<+>) xs (Vector.indexed ys)
    signal <.> kernel
      | Vector.null signal = Vector.empty
      | Vector.null kernel = Vector.empty
      | otherwise = Vector.generate (slen + klen - 1) f
      where
        f n =
            foldl'
                (\a k ->
                      a <+>
                      Vector.unsafeIndex signal k <.>
                      Vector.unsafeIndex kernel (n - k))
                zero
                [kmin .. kmax]
          where
            !kmin = max 0 (n - (klen - 1))
            !kmax = min n (slen - 1)
        !slen = Vector.length signal
        !klen = Vector.length kernel

Search

As has been observed before (Rivas, Jaskelioff, and Schrijvers 2015) there’s a pretty suggestive similarity between semirings and the Applicative/Alternative classes in Haskell:

class Semiring a where
  one :: a
  zero :: a
  (<+>) :: a -> a -> a
  (<.>) :: a -> a -> a

class Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

class Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a

So can our implementation of convolution be used to implement the methods for these classes? Partially:

newtype Search f a = Search { runSearch :: [f a] }

instance Functor f => Functor (Search f) where
  fmap f (Search xs) = Search ((fmap.fmap) f xs)

instance Alternative f => Applicative (Search f) where
  pure x = Search [pure x]
  _ <*> Search [] = Search []
  Search xs <*> Search ys = Search (foldr f [] xs) where
    f x zs = foldr (g x) id ys (empty : zs)
    g x y a (z:zs) = (x <*> y <|> z) : a zs
    g x y a [] = (x <*> y) : a []

instance Alternative f => Alternative (Search f) where
  Search xs <|> Search ys = Search (go xs ys) where
    go [] ys = ys
    go xs [] = xs
    go (x:xs) (y:ys) = (x <|> y) : go xs ys
  empty = Search []

At first, this seems perfect: the types all match up, and the definitions seem sensible. The issue is with the laws: Applicative and Alternative are missing four that semirings require. In particular: commutativity of plus, annihilation by zero, and distributivity left and right:

xs <|> ys = ys <|> xs
empty <*> xs = fs <*> empty = empty
fs <*> (xs <|> ys) = fs <*> xs <|> fs <*> ys
(fs <|> gs) <*> xs = fs <*> xs <|> gs <*> ys

The vast majority of the instances of Alternative today fail one or more of these laws. Taking lists as an example, ++ obviously isn’t commutative, and <*> only distributes when it’s on the right.

What’s the problem, though? Polynomial multiplication follows more laws than those required by Applicative: why should that worry us? Unfortunately, in order for multiplication to follow those laws, it actually relies on the underlying semiring being law-abiding. And it fails the applicative laws when it isn’t.

There are two angles from which we could come at this problem: either we relax the semiring laws and try and make our implementation of convolution rely on them as little as possible, or we find Alternative instances which follow the semiring laws. Or we could meet in the middle, relaxing the laws as much as possible until we find some Alternatives that meet our standards.

This has actually been accomplished in several papers: the previously mentioned Rivas, Jaskelioff, and Schrijvers (2015) discusses near-semirings, defined as semiring-like structures with associativity, identity, and these two laws:

$0 \times x = 0$ $(x \plus y) \times z = (x \times z) \plus (y \times z)$

In contrast to normal semirings, zero only annihilates when it’s on the left, and multiplication only distributes over addition when it’s on the right. Addition is not required to be commutative.

The lovely paper Spivey (2009) has a similar concept: a “bunch”.

class Bunch m where
  return :: a -> m a
  (>>=) :: m a -> (a -> m b) -> m b
  zero :: m a
  (<|>) :: m a -> m a -> m a
  wrap :: m a -> m a

The laws are all the same (with <*> implemented in terms of >>=), and the extra wrap operation can be expressed like so:

wrap :: Alternative f => Search f a -> Search f a
wrap (Search xs) = Search (empty : xs)

A definition of >>= for our polynomials is also provided:

[] >>= _ = []
(x:xs) >>= f = foldr (<|>) empty (fmap f x) <|> wrap (xs >>= f)

This will require the underlying f to be Foldable. We can inline a little, and express the whole thing as a fold:

instance (Foldable f, Alternative f) => Monad (Search f) where
  Search xs >>= k = foldr f empty xs where
    f e a = foldr ((<|>) . k) (wrap a) e

For Search to meet the requirements of a bunch, the paper notes that the f must be assumed to be a bag, i.e., the order of its elements must be ignored.

Kiselyov et al. (2005) kind of goes the other direction, defining a monad which has fair disjunction and conjunction. Unfortunately, the fair conjunction loses associativity.

Distance

The end of the paper on algebras for combinatorial search wonders if notions of distance could be added to some of the algebras. I think that should be as simple as supplying a suitable near-semiring for f, but the definition of >>= would need to be changed. The near-semiring I had in mind was the probability monad. It works correctly if inlined:

newtype Search s a = Search { runSearch :: [[(a,s)]] }

instance Functor (Search s) where
  fmap f (Search xs) = Search ((fmap.fmap.first) f xs)

instance Semiring s => Applicative (Search s) where
  pure x = Search [[(x,one)]]
  _ <*> Search [] = Search []
  Search xs <*> Search ys = Search (foldr f [] xs) where
    f x zs = foldr (g x) id ys (empty : zs)
    g x y a (z:zs) = (m x y ++ z) : a zs
    g x y a [] = (m x y) : a []
    m ls rs = [(l r, lp<.>rp) | (l,lp) <- ls, (r,rp) <- rs]

instance Semiring s => Alternative (Search s) where
  Search xs <|> Search ys = Search (go xs ys) where
    go [] ys = ys
    go xs [] = xs
    go (x:xs) (y:ys) = (x ++ y) : go xs ys
  empty = Search []

wrap :: Search s a -> Search s a
wrap (Search xs) = Search ([] : xs)

instance Semiring s => Monad (Search s) where
  Search xs >>= k = foldr f empty xs where
    f e a = foldr ((<|>) . uncurry (mulIn . k)) (wrap a) e
    mulIn (Search x) xp = Search ((fmap.fmap.fmap) (xp<.>) x)

But I couldn’t figure out how to get it to work for a more generalized inner monad. The above could probably be sped up, or randomized, using the many well-known techniques for probability monad optimization.

References

Dolan, Stephen. 2013. “Fun with semirings: A functional pearl on the abuse of linear algebra.” In, 48:101. ACM Press. doi:10.1145/2500365.2500613. https://www.cl.cam.ac.uk/~sd601/papers/semirings.pdf.

Kiselyov, Oleg, Chung-chieh Shan, Daniel P Friedman, and Amr Sabry. 2005. “Backtracking, interleaving, and terminating monad transformers (functional pearl).” ACM SIGPLAN Notices 40 (9): 192–203. http://okmij.org/ftp/Computation/monads.html#LogicT.

McBride, Conor, and Ross Paterson. 2008. “Applicative programming with effects.” Journal of functional programming 18 (01): 1–13. http://strictlypositive.org/Idiom.pdf.

McIlroy, M. Douglas. 1999. “Power Series, Power Serious.” J. Funct. Program. 9 (3) (May): 325–337. doi:10.1017/S0956796899003299. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.333.3156&rep=rep1&type=pdf.

Okasaki, Chris. 1999. “From Fast Exponentiation to Square Matrices: An Adventure in Types.” In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’99), Paris, France, September 27-29, 1999, 34:28. ACM. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.357&rep=rep1&type=pdf.

Spivey, J. Michael. 2009. “Algebras for combinatorial search.” Journal of Functional Programming 19 (3-4) (July): 469–487. doi:10.1017/S0956796809007321. https://pdfs.semanticscholar.org/db3e/373bb6e7e7837ebc524da0a25903958554ed.pdf.

Applicative Arithmetic

Donnacha Oisín Kidney — Mon, 25 Sep 2017 00:00:00 UT

Posted on September 25, 2017

Tags: Haskell, Applicative

Safer Arithmetic

There are a couple partial functions in the Haskell Prelude which people seem to agree shouldn’t be there. head, for example, will throw an error on an empty list. Most seem to agree that it should work something more like this:

head :: Foldable f => f a -> Maybe a
head = foldr (const . Just) Nothing

There are other examples, like last, !!, etc.

One which people don’t agree on, however, is division by zero. In the current Prelude, the following will throw an error:

1 / 0

The “safe” version might have a signature like this:

(/) :: Fractional a => a -> a -> Maybe a

However, this turns out to be quite a headache for writing code generally. So the default is the (somewhat) unsafe version.

Is there a way to introduce a safer version without much overhead, so the programmer is given the option? Of course! With some newtype magic, it’s pretty simple to write a wrapper which catches division by zero in some arbitrary monad:

newtype AppNum f a = AppNum
    { runAppNum :: f a
    } deriving (Functor,Applicative,Monad,Alternative,Show,Eq,MonadFail)

instance (Num a, Applicative f) =>
         Num (AppNum f a) where
    abs = fmap abs
    signum = fmap signum
    (+) = liftA2 (+)
    (*) = liftA2 (*)
    (-) = liftA2 (-)
    negate = fmap negate
    fromInteger = pure . fromInteger

instance (Fractional a, MonadFail f, Eq a) =>
         Fractional (AppNum f a) where
    fromRational = pure . fromRational
    xs / ys =
        ys >>=
        \case
            0 -> fail "divide by zero"
            y -> fmap (/ y) xs

I’m using the -XLambdaCase extension and MonadFail here.

Free Applicatives

You’ll notice that you only need Applicative for most of the arithmetic operations above. In fact, you only need Monad when you want to examine the contents of f. Using that fact, we can manipulate expression trees using the free applicative from the free package. Say, for instance, we want to have free variables in our expressions. Using Either, it’s pretty easy:

type WithVars = AppNum (Ap (Either String)) Integer

var :: String -> WithVars
var = AppNum . liftAp . Left

We can collect the free variables from an expression:

vars :: WithVars -> [String]
vars = runAp_ (either pure (const [])) . runAppNum

x = 1 :: WithVars
y = var "y"
z = var "z"

vars (x + y + z) -- ["y","z"]

If we want to sub in, though, we’re going to run into a problem: we can’t just pass in a Map String Integer because you’re able to construct values like this:

bad :: AppNum (Ap (Either String)) (Integer -> Integer -> Integer)
bad = AppNum (liftAp (Left "oh noes"))

We’d need to pass in a Map String (Integer -> Integer -> Integer) as well; in fact you’d need a map for every possible type. Which isn’t feasible.

GADTs

Luckily, we can constrain the types of variables in our expression so that they’re always Integer, using a GADT:

data Variable a where
        Constant :: a -> Variable a
        Variable :: String -> Variable Integer

The type above seems useless on its own: it doesn’t have a Functor instance, never mind an Applicative, so how can it fit into AppNum?

The magic comes from the free applicative, which converts any type of kind Type -> Type into an applicative. With that in mind, we can change around the previous code:

type WithVars = AppNum (Ap Variable) Integer

var :: String -> WithVars
var = AppNum . liftAp . Variable

vars :: WithVars -> [String]
vars = runAp_ f . runAppNum
  where
    f :: Variable a -> [String]
    f (Constant _) = []
    f (Variable s) = [s]

And write the function to sub in for us:

variableA
    :: Applicative f
    => (String -> f Integer) -> Variable a -> f a
variableA _ (Constant x) = pure x
variableA f (Variable s) = f s

variable :: (String -> Integer) -> Variable a -> a
variable _ (Constant x) = x
variable f (Variable s) = f s

replace :: Map String Integer -> WithVars -> Integer
replace m = runAp (variable (m Map.!)) . runAppNum

replace (Map.fromList [("z",2), ("y",3)]) (x + y + z)
-- 6

Accumulation

This will fail if a free variable isn’t present in the map, unfortunately. To fix it, we could use Either instead of Identity:

replace :: Map String Integer -> WithVars -> Either String Integer
replace m =
    runAp
        (variableA $
         \s ->
              maybe (Left s) Right (Map.lookup s m)) .
    runAppNum

But this only gives us the first missing variable encountered. We’d like to get back all of the missing variables, ideally: accumulating the Lefts. Either doesn’t accumulate values, as if it did it would break the monad laws.

There’s no issue with the applicative laws, though, which is why the validation package provides a non-monadic either-like type, which we can use here.

replace :: Map String Integer -> WithVars -> AccValidation [String] Integer
replace m =
    runAp
        (variableA $
         \s ->
              maybe (AccFailure [s]) pure (Map.lookup s m)) .
    runAppNum

replace (Map.fromList []) (x + y + z)
-- AccFailure ["y","z"]

Other uses

There are a bunch more applicatives you could use instead of Either. Using lists, for instance, you could calculate the possible outcomes from a range of inputs:

range :: WithVars -> [Integer]
range = runAp (variable (const [1..3])) . runAppNum

range (x + y + z)
-- [3,4,5,4,5,6,5,6,7]

Or you could ask the user for input:

query :: WithVars -> IO Integer
query = runAp (variable f) . runAppNum
  where
    f s = do
      putStr "Input a value for "
      putStrLn s
      fmap read getLine

Finally, and this one’s a bit exotic, you could examine every variable in turn, with defaults for the others:

zygo
    :: (forall x. f x -> x)
    -> (forall x. f x -> (x -> a) -> b)
    -> Ap f a
    -> [b]
zygo (l :: forall x. f x -> x) (c :: forall x. f x -> (x -> a) -> b) =
    fst . go id
  where
    go :: forall c. (c -> a) -> Ap f c -> ([b], c)
    go _ (Pure x) = ([], x)
    go k (Ap x f) = (c x (k . ls) : xs, ls lx)
      where
        (xs,ls) = go (k . ($ lx)) f
        lx = l x

examineEach :: WithVars -> [Integer -> Integer]
examineEach = zygo (variable (const 1)) g . runAppNum
  where
    g :: Variable a -> (a -> b) -> Integer -> b
    g (Constant x) rhs _ = rhs x
    g (Variable _) rhs i = rhs i

This produces a list of functions which are equivalent to subbing in for each variable with the rest set to 1.

Verifying Data Structures in Haskell

Donnacha Oisín Kidney — Sun, 23 Apr 2017 00:00:00 UT

Posted on April 23, 2017

Tags: Haskell, Dependent Types, Data Structures

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeInType #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE RebindableSyntax #-}

{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

module VerifiedDataStructures where

import Data.Kind hiding (type (*))
import Data.Type.Equality
import Unsafe.Coerce
import GHC.TypeLits hiding (type (<=))
import Data.Proxy
import Data.Coerce
import Prelude

A while ago I read this post on reddit (by David Feuer), about sorting traversables (which was a follow-up on this post by Will Fancher), and I was inspired to write some pseudo-dependently-typed Haskell. The post (and subsequent library) detailed how to use size-indexed heaps to perform fast, total sorting on any traversable. I ended up with a library which has five size-indexed heaps (Braun, pairing, binomial, skew, and leftist), each verified for structural correctness. I also included the non-indexed implementations of each for comparison (as well as benchmarks, tests, and all that good stuff).

The purpose of this post is to go through some of the tricks I used and problems I encountered writing a lot of type-level code in modern Haskell.

Type-Level Numbers in Haskell

In order to index things by their size, we’ll need a type-level representation of size. We’ll use Peano numbers for now:

data Peano = Z | S Peano

Z stands for zero, and S for successor. The terseness is pretty necessary here, unfortunately: arithmetic becomes unreadable otherwise. The simplicity of this definition is useful for proofs and manipulation; however any runtime representation of these numbers is going to be woefully slow.

With the DataKinds extension, the above is automatically promoted to the type-level, so we can write type-level functions (type families) on the Peano type:

type family Plus (n :: Peano) (m :: Peano) :: Peano where
        Plus Z m = m
        Plus (S n) m = S (Plus n m)

Here the TypeFamilies extension is needed. I’ll try and mention every extension I’m using as we go, but I might forget a few, so check the repository for all of the examples (quick aside: I did manage to avoid using UndecidableInstances, but more on that later). One pragma that’s worth mentioning is:

{-# OPTIONS_GHC -fno-warn-unticked-promoted-constructors #-}

This suppresses warnings on the definition of Plus above. Without it, GHC would want us to write:

type family Plus (n :: Peano) (m :: Peano) :: Peano where
        Plus 'Z m = m
        Plus ('S n) m = 'S (Plus n m)

I think that looks pretty ugly, and it can get much worse with more involved arithmetic. The only thing I have found the warnings useful for is []: the type-level empty list gives an error in its unticked form.

Using the Type-Level Numbers with a Pairing Heap

In the original post, a pairing heap (Fredman et al. 1986) was used, for its simplicity and performance. The implementation looked like this:

data Heap n a where
  E :: Heap Z a
  T :: a -> HVec n a -> Heap (S n) a

data HVec n a where
  HNil :: HVec Z a
  HCons :: Heap m a -> HVec n a -> HVec (Plus m n) a

You immediately run into trouble when you try to define merge:

merge :: Ord a => Heap m a -> Heap n a -> Heap (Plus m n) a
merge E ys = ys
merge xs E = xs
merge h1@(T x xs) h2@(T y ys)
  | x <= y = T x (HCons h2 xs)
  | otherwise = T y (HCons h1 ys)

Three errors show up here, but we’ll look at the first one:

Could not deduce (m ~ (Plus m Z))

GHC doesn’t know that $x = x + 0$ . Somehow, we’ll have to prove that it does.

Singletons

In a language with true dependent types, proving the proposition above is as simple as:

plusZeroNeutral : (n : Nat) -> n + 0 = n
plusZeroNeutral Z = Refl
plusZeroNeutral (S k) = cong (plusZeroNeutral k)

(this example is in Idris)

In Haskell, on the other hand, we can’t do the same: functions on the value-level Peano have no relationship with functions on the type-level Peano. There’s no way to automatically link or promote one to the other.

This is where singletons come in (Eisenberg and Weirich 2012). A singleton is a datatype which mirrors a type-level value exactly, except that it has a type parameter which matches the equivalent value on the type-level. In this way, we can write functions on the value-level which are linked to the type-level. Here’s a potential singleton for Peano:

data Natty n where
    Zy :: Natty Z
    Sy :: Natty n -> Natty (S n)

(we need GADTs for this example)

Now, when we pattern-match on Natty, we get a proof of whatever its type parameter was. Here’s a trivial example:

isZero :: Natty n -> Maybe (n :~: Z)
isZero Zy = Just Refl
isZero (Sy _) = Nothing

When we match on Zy, the only value which n could have been is Z, because the only way to construct Zy is if the type parameter is Z.

Using this technique, the plusZeroNeutral proof looks reasonably similar to the Idris version:

plusZeroNeutral :: Natty n -> Plus n Z :~: n
plusZeroNeutral Zy = Refl
plusZeroNeutral (Sy n) = case plusZeroNeutral n of
    Refl -> Refl

To generalize the singletons a little, we could probably use the singletons library, or we could roll our own:

data family The k :: k -> Type

data instance The Peano n where
    Zy :: The Peano Z
    Sy :: The Peano n -> The Peano (S n)

plusZeroNeutral :: The Peano n -> Plus n Z :~: n
plusZeroNeutral Zy = Refl
plusZeroNeutral (Sy n) = case plusZeroNeutral n of
    Refl -> Refl

The The naming is kind of cute, I think. It makes the signature look almost like the Idris version (the is a function from the Idris standard library). The The type family requires the TypeInType extension, which I’ll talk a little more about later.

Proof Erasure and Totality

There’s an issue with these kinds of proofs: the proof code runs every time it is needed. Since the same value is coming out the other end each time (Refl), this seems wasteful.

In a language like Idris, this problem is avoided by noticing that you’re only using the proof for its type information, and then erasing it at runtime. In Haskell, we can accomplish the same with a rule:

{-# NOINLINE plusZeroNeutral #-}

{-# RULES
"plusZeroNeutral" forall x. plusZeroNeutral x 
  = unsafeCoerce (Refl :: 'Z :~: 'Z)
 #-}

This basically says “if this type-checks, then the proof must exist, and therefore the proof must be valid. So don’t bother running it”. Unfortunately, that’s a little bit of a lie. It’s pretty easy to write a proof which type-checks that isn’t valid:

falseIsTrue :: False :~: True
falseIsTrue = falseIsTrue

We won’t be able to perform computations which rely on this proof in Haskell, though: because the computation will never terminate, the proof will never provide an answer. This means that, while the proof isn’t valid, it is type safe. That is, of course, unless we use our manual proof-erasure technique. The RULES pragma will happily replace it with the unsafeCoerce version, effectively introducing unsoundness into our proofs. The reason that this doesn’t cause a problem for language like Idris is that Idris has a totality checker: you can’t write the above definition (with the totality checker turned on) in Idris.

So what’s the solution? Do we have to suffer through the slower proof code to maintain correctness? In reality, it’s usually OK to assume termination. It’s pretty easy to see that a proof like plusZeroNeutral is total. It’s worth bearing in mind, though, that until Haskell gets a totality checker (likely never, apparently) these proofs aren’t “proper”.

Generating Singletons

One extra thing: while you’re proving things in one area of your code, you might not have the relevant singleton handy. To generate them on-demand, you’ll need a typeclass:

class KnownSing (x :: k) where
    sing :: The k x

instance KnownSing Z where
    sing = Zy

instance KnownSing n => KnownSing (S n) where
    sing = Sy sing

This kind of drives home the inefficiency of singleton-based proofs, and why it’s important to erase them aggressively.

Proofs Bundled with the Data Structure

One other way to solve these problems is to try find a data structure which runs the proof code anyway. As an example, consider a length-indexed list:

infixr 5 :-
data List n a where
    Nil :: List Z a
    (:-) :: a -> List n a -> List (S n) a

You might worry that concatenation of two lists requires some expensive proof code, like merge for the pairing heap. Maybe surprisingly, the default implementation just works:

infixr 5 ++
(++) :: List n a -> List m a -> List (Plus n m) a
(++) Nil ys = ys
(++) (x :- xs) ys = x :- xs ++ ys

Why? Well, if you look back to the definition of Plus, it’s almost exactly the same as the definition of (++). In effect, we’re using lists as the singleton for Peano here.

The question is, then: is there a heap which performs these proofs automatically for functions like merge? As far as I can tell: almost. First though:

Small Digression: Manipulating and Using the Length-Indexed List

The standard definition of ++ on normal lists can be cleaned up a little with foldr

(++) :: [a] -> [a] -> [a]
(++) = flip (foldr (:))

Can we get a similar definition for our length-indexed lists? Turns out we can, but the type of foldr needs to be a little different:

foldrList :: (forall x. a -> b x -> b (S x)) 
          -> b m -> List n a -> b (n + m)
foldrList f b Nil = b
foldrList f b (x :- xs) = f x (foldrList f b xs)

newtype Flip (f :: t -> u -> Type) (a :: u) (b :: t) 
    = Flip { unFlip :: f b a }

foldrList1 :: (forall x. a -> b x c -> b (S x) c) 
           -> b m c -> List n a -> b (n + m) c
foldrList1 f b 
    = unFlip . foldrList (\e -> Flip . f e . unFlip) (Flip b)

infixr 5 ++
(++) :: List n a -> List m a -> List (n + m) a
(++) = flip (foldrList1 (:-))

So what’s the point of this more complicated version? Well, if this were normal Haskell, we might get some foldr-fusion or something (in reality we would probably use augment if that were the purpose).

With this type-level business, though, there’s a similar application: loop unrolling. Consider the natural-number type again. We can write a typeclass which will perform induction over them:

class KnownPeano (n :: Peano)  where
    unrollRepeat :: Proxy n -> (a -> a) -> a -> a

instance KnownPeano Z where
    unrollRepeat _ = const id
    {-# INLINE unrollRepeat #-}

instance KnownPeano n =>
         KnownPeano (S n) where
    unrollRepeat (_ :: Proxy (S n)) f x =
        f (unrollRepeat (Proxy :: Proxy n) f x)
    {-# INLINE unrollRepeat #-}

Because the recursion here calls a different unrollRepeat function in the “recursive” call, we get around the usual hurdle of not being able to inline recursive calls. That means that the whole loop will be unrolled, at compile-time. We can do the same for foldr:

class HasFoldr (n :: Peano) where
    unrollFoldr 
        :: (forall x. a -> b x -> b (S x)) 
        -> b m 
        -> List n a 
        -> b (n + m)
  
instance HasFoldr Z where
    unrollFoldr _ b _ = b
    {-# INLINE unrollFoldr #-}

instance HasFoldr n => HasFoldr (S n) where
    unrollFoldr f b (x :- xs) = f x (unrollFoldr f b xs)
    {-# INLINE unrollFoldr #-}

I can’t think of many uses for this technique, but one that comes to mind is an n-ary uncurry (like Lisp’s apply):

infixr 5 :-
data List (xs :: [*]) where
        Nil :: List '[]
        (:-) :: a -> List xs -> List (a ': xs)

class KnownList (xs :: [*])  where
    foldrT
        :: (forall y ys. y -> result ys -> result (y ': ys))
        -> result '[]
        -> List xs
        -> result xs

instance KnownList ('[] :: [*]) where
    foldrT _ = const
    {-# INLINE foldrT #-}

instance KnownList xs =>
         KnownList (x ': xs) where
    foldrT f b (x :- xs) = f x (foldrT f b xs)
    {-# INLINE foldrT #-}

type family Func (xs :: [*]) (y :: *) where
        Func '[] y = y
        Func (x ': xs) y = x -> Func xs y

newtype FunType y xs = FunType
    { runFun :: Func xs y -> y
    }

uncurry
    :: KnownList xs
    => Func xs y -> List xs -> y
uncurry f l =
    runFun
        (foldrT
             (c (\x g h -> g (h x)))
             (FunType id)
             l)
        f
  where
    c :: (a -> ((Func xs y -> y) -> (Func zs z -> z)))
      -> (a -> (FunType y xs -> FunType z zs))
    c = coerce
    {-# INLINE c #-}
{-# INLINE uncurry #-}

I think that you can be guaranteed the above is inlined at compile-time, making it essentially equivalent to a handwritten uncurry.

Binomial Heaps

Anyway, back to the size-indexed heaps. The reason that (++) worked so easily on lists is that a list can be thought of as the data-structure equivalent to Peano numbers. Another numeric-system-based data structure is the binomial heap, which is based on binary numbering (I’m going mainly off of the description from Hinze 1999).

So, to work with binary numbers, let’s get some preliminaries on the type-level out of the way:

data instance The Bool x where
    Falsy :: The Bool False
    Truey :: The Bool True

data instance The [k] xs where
    Nily :: The [k] '[]
    Cony :: The k x -> The [k] xs -> The [k] (x : xs)

instance KnownSing True where
    sing = Truey

instance KnownSing False where
    sing = Falsy

instance KnownSing '[] where
    sing = Nily

instance (KnownSing xs, KnownSing x) =>
         KnownSing (x : xs) where
    sing = Cony sing sing

We’ll represent a binary number as a list of Booleans:

type family Sum (x :: Bool) (y :: Bool) (cin :: Bool) :: Bool where
        Sum False False False = False
        Sum False False True  = True
        Sum False True  False = True
        Sum False True  True  = False
        Sum True  False False = True
        Sum True  False True  = False
        Sum True  True  False = False
        Sum True  True  True  = True

type family Carry (x :: Bool) (y :: Bool) (cin :: Bool)
     (xs :: [Bool]) (ys :: [Bool]) :: [Bool] where
        Carry False False False xs ys = Add False xs ys
        Carry False False True  xs ys = Add False xs ys
        Carry False True  False xs ys = Add False xs ys
        Carry False True  True  xs ys = Add True  xs ys
        Carry True  False False xs ys = Add False xs ys
        Carry True  False True  xs ys = Add True  xs ys
        Carry True  True  False xs ys = Add True  xs ys
        Carry True  True  True  xs ys = Add True  xs ys

type family Add (cin :: Bool) (xs :: [Bool]) (ys :: [Bool]) ::
     [Bool] where
        Add c (x : xs) (y : ys) = Sum x y c : Carry x y c xs ys
        Add False '[] ys = ys
        Add False xs '[] = xs
        Add True  '[] ys = CarryOne ys
        Add True  xs '[] = CarryOne xs

type family CarryOne (xs :: [Bool]) :: [Bool] where
        CarryOne '[] = True : '[]
        CarryOne (False : xs) = True : xs
        CarryOne (True  : xs) = False : CarryOne xs

The odd definition of Carry is to avoid UndecidableInstances: if we had written, instead:

type family Carry (x :: Bool) (y :: Bool) (cin :: Bool) :: Bool where
        Carry False False False = False
        Carry False False True  = False
        Carry False True  False = False
        Carry False True  True  = True
        Carry True  False False = False
        Carry True  False True  = True
        Carry True  True  False = True
        Carry True  True  True  = True

type family Add (cin :: Bool) (xs :: [Bool]) (ys :: [Bool]) ::
     [Bool] where
        Add c (x : xs) (y : ys) = Sum x y c : Add (Carry x y c) xs ys
        Add False '[] ys = ys
        Add False xs '[] = xs
        Add True  '[] ys = CarryOne ys
        Add True  xs '[] = CarryOne xs

We would have been warned about nested type-family application.

Now we can base the merge function very closely on these type families. First, though, we’ll have to implement the heap.

Almost-Verified Data Structures

There are different potential properties you can verify in a data structure. In the sort-traversable post, the property of interest was that the number of elements in the structure would stay the same after adding and removing some number $n$ of elements. For this post, we’ll also verify structural invariants. I won’t, however, verify the heap property. Maybe in a later post.

When indexing a data structure by its size, you encode an awful lot of information into the type signature: the type becomes very specific to the structure in question. It is possible, though, to encode a fair few structural invariants without getting so specific. Here’s a signature for “perfect leaf tree”:

data BalTree a = Leaf a | Node (BalTree (a,a))

With that signature, it’s impossible to create a tree with more elements in its left branch than its right; the size of the tree, however, remains unspecified. You can use a similar trick to implement matrices which must be square (from Okasaki 1999): the usual trick (type Matrix n a = List n (List n a)) is too specific, providing size information at compile-time. If you’re interested in this approach, there are several more examples in Hinze (2001).

It is possible to go from the size-indexed version back to the non-indexed version, with an existential (RankNTypes for this example):

data ErasedSize f a = forall (n :: Peano). ErasedSize
    { runErasedSize :: f n a
    }

This will let you prove invariants in your implementation using an index, while keeping the user-facing type signature general and non-indexed.

A Fully-Structurally-Verified Binomial Heap

Wasserman (2010), was able to encode all of the structural invariants of the binomial heap without indexing by its size (well, all invariants except truncation, which turned out to be important a little later). I’ll be using a similar approach, except I’ll leverage some of the newer bells and whistles in GHC. Where Wasserman’s version used types like this for the numbering:

data Zero a = Zero
data Succ rk a = BinomTree rk a :< rk a
data BinomTree rk a = BinomTree a (rk a)

We can reuse the type-level Peano numbers with a GADT:

infixr 5 :-
data Binomial xs rk a where
       Nil :: Binomial '[] n a
       Skip :: Binomial xs (S rk) a -> Binomial (False : xs) rk a
       (:-) :: Tree rk a 
            -> Binomial xs (S rk) a 
            -> Binomial (True : xs) rk a

data Tree rk a = Root a (Node rk a)

infixr 5 :<
data Node n a where
       NilN :: Node Z a
       (:<) :: Tree n a -> Node n a -> Node (S n) a

The definition of Tree here ensures that any tree of rank $n$ has $2^n$ elements. The binomial heap, then, is a list of trees, in ascending order of size, with a True at every point in its type-level list where a tree is present, and a False wherever one is absent. In other words, the type-level list is a binary encoding of the number of elements it contains.

And here are the merge functions:

mergeTree :: Ord a => Tree rk a -> Tree rk a -> Tree (S rk) a
mergeTree xr@(Root x xs) yr@(Root y ys)
  | x <= y    = Root x (yr :< xs)
  | otherwise = Root y (xr :< ys)

merge 
    :: Ord a 
    => Binomial xs z a 
    -> Binomial ys z a 
    -> Binomial (Add False xs ys) z a
merge Nil ys              = ys
merge xs Nil              = xs
merge (Skip xs) (Skip ys) = Skip (merge xs ys)
merge (Skip xs) (y :- ys) = y :- merge xs ys
merge (x :- xs) (Skip ys) = x :- merge xs ys
merge (x :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys)

mergeCarry
    :: Ord a
    => Tree rk a 
    -> Binomial xs rk a 
    -> Binomial ys rk a 
    -> Binomial (Add True xs ys) rk a
mergeCarry t Nil ys              = carryOne t ys
mergeCarry t xs Nil              = carryOne t xs
mergeCarry t (Skip xs) (Skip ys) = t :- merge xs ys
mergeCarry t (Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys)
mergeCarry t (x :- xs) (Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys)
mergeCarry t (x :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys

carryOne 
    :: Ord a 
    => Tree rk a -> Binomial xs rk a -> Binomial (CarryOne xs) rk a
carryOne t Nil       = t :- Nil
carryOne t (Skip xs) = t :- xs
carryOne t (x :- xs) = Skip (carryOne (mergeTree t x) xs)

You’ll notice that no proofs are needed: that’s because the merge function itself is the same as the type family, like the way ++ for lists was the same as the Plus type family.

Of course, this structure is only verified insofar as you believe the type families. It does provide a degree of double-entry, though: any mistake in the type family will have to be mirrored in the merge function to type-check. On top of that, we can write some proofs of properties we might expect:

addCommutes
  :: The [Bool] xs
  -> The [Bool] ys
  -> Add False xs ys :~: Add False ys xs
addCommutes Nily _ = Refl
addCommutes _ Nily = Refl
addCommutes (Cony Falsy xs) (Cony Falsy ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Truey xs) (Cony Falsy ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Falsy xs) (Cony Truey ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutes (Cony Truey xs) (Cony Truey ys) =
    gcastWith (addCommutesCarry xs ys) Refl

addCommutesCarry
  :: The [Bool] xs
  -> The [Bool] ys
  -> Add True xs ys :~: Add True ys xs
addCommutesCarry Nily _ = Refl
addCommutesCarry _ Nily = Refl
addCommutesCarry (Cony Falsy xs) (Cony Falsy ys) =
    gcastWith (addCommutes xs ys) Refl
addCommutesCarry (Cony Truey xs) (Cony Falsy ys) =
    gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry (Cony Falsy xs) (Cony Truey ys) =
    gcastWith (addCommutesCarry xs ys) Refl
addCommutesCarry (Cony Truey xs) (Cony Truey ys) =
    gcastWith (addCommutesCarry xs ys) Refl

Unfortunately, though, this method does require proofs (ugly proofs) for the delete-min operation. One of the issues is truncation: since the binary digits are stored least-significant-bit first, the same number can be represented with any number of trailing zeroes. This kept causing problems for me when it came to subtraction, and adding the requirement of no trailing zeroes (truncation) to the constructors for the heap was a pain, requiring extra proofs on merge to show that it preserves truncation.

Doubly-Dependent Types

Since some of these properties are much easier to verify on the type-level Peano numbers, one approach might be to convert back and forth between Peano numbers and binary, and use the proofs on Peano numbers instead.

type family BintoPeano (xs :: [Bool]) :: Peano where
        BintoPeano '[] = Z
        BintoPeano (False : xs) = BintoPeano xs + BintoPeano xs
        BintoPeano (True : xs) = S (BintoPeano xs + BintoPeano xs)

First problem: this requires UndecidableInstances. I’d really rather not have that turned on, to be honest. In Idris (and Agda), you can prove decidability using a number of different methods, but this isn’t available in Haskell yet.

Regardless, we can push on.

To go in the other direction, we’ll need to calculate the parity of natural numbers. Taken from the Idris tutorial:

data Parity (n :: Peano) where
    Even :: The Peano n -> Parity (n + n)
    Odd  :: The Peano n -> Parity (S (n + n))

parity :: The Peano n -> Parity n
parity Zy = Even Zy
parity (Sy Zy) = Odd Zy
parity (Sy (Sy n)) = case parity n of
  Even m -> gcastWith (plusSuccDistrib m m) (Even (Sy m))
  Odd  m -> gcastWith (plusSuccDistrib m m) (Odd (Sy m))

plusSuccDistrib :: The Peano n -> proxy m -> n + S m :~: S (n + m)
plusSuccDistrib Zy _ = Refl
plusSuccDistrib (Sy n) p = gcastWith (plusSuccDistrib n p) Refl

We need this function on the type-level, though, not the value-level: here, again, we run into trouble. What does gcastWith look like on the type-level? As far as I can tell, it doesn’t exist (yet. Although I haven’t looked deeply into the singletons library yet).

This idea of doing dependently-typed stuff on the type-level started to be possible with TypeInType. For instance, we could have defined our binary type as:

data Binary :: Peano -> Type where
    O :: Binary n -> Binary (n + n)
    I :: Binary n -> Binary (S (n + n))
    E :: Binary Z

And then the binomial heap as:

data Binomial (xs :: Binary n) (rk :: Peano) (a :: Type) where
       Nil :: Binomial E n a
       Skip :: Binomial xs (S rk) a -> Binomial (O xs) rk a
       (:-) :: Tree rk a 
            -> Binomial xs (S rk) a 
            -> Binomial (I xs) rk a

What we’re doing here is indexing a type by an indexed type. This wasn’t possible in Haskell a few years ago. It still doesn’t get us a nice definition of subtraction, though.

Using a Typechecker Plugin

It’s pretty clear that this approach gets tedious almost immediately. What’s more, if we want the proofs to be erased, we introduce potential for errors.

The solution? Beef up GHC’s typechecker with a plugin. I first came across this approach in Kenneth Foner’s talk at Compose. He used a plugin that called out to the Z3 theorem prover (from Diatchki 2015); I’ll use a simpler plugin which just normalizes type-literals.

From what I’ve used of these plugins so far, they seem to work really well. They’re very unobtrusive, only requiring a pragma at the top of your file:

{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

The plugin is only called when GHC can’t unify two types: this means you don’t get odd-looking error messages in unrelated code (in fact, the error messages I’ve seen so far have been excellent—a real improvement on the standard error messages for type-level arithmetic). Another benefit is that we get to use type-level literals (Nat imported from GHC.TypeLits), rather then the noisy-looking type-level Peano numbers.

data Tree n a = Root a (Node n a)

data Node :: Nat -> Type -> Type where
        NilN :: Node 0 a
        (:<) :: {-# UNPACK #-} !(Tree n a)
             -> Node n a
             -> Node (1 + n) a

mergeTree :: Ord a => Tree n a -> Tree n a -> Tree (1 + n) a
mergeTree xr@(Root x xs) yr@(Root y ys)
  | x <= y    = Root x (yr :< xs)
  | otherwise = Root y (xr :< ys)

infixr 5 :-
data Binomial :: Nat -> Nat -> Type -> Type where
        Nil  :: Binomial n 0 a
        (:-) :: {-# UNPACK #-} !(Tree z a)
             -> Binomial (1 + z) xs a
             -> Binomial z (1 + xs + xs) a
        Skip :: Binomial (1 + z) (1 + xs) a
             -> Binomial z (2 + xs + xs) a

This definition also ensures that the binomial heap has no trailing zeroes in its binary representation: the Skip constructor can only be applied to a heap bigger than zero.

Since we’re going to be looking at several different heaps, we’ll need a class to represent all of them:

class IndexedQueue h a where

    {-# MINIMAL insert, empty, minViewMay, minView #-}

    empty
        :: h 0 a

    minView
        :: h (1 + n) a -> (a, h n a)

    singleton
        :: a -> h 1 a
    singleton = flip insert empty

    insert
        :: a -> h n a -> h (1 + n) a

    minViewMay
       :: h n a
       -> (n ~ 0 => b)
       -> (forall m. (1 + m) ~ n => a -> h m a -> b)
       -> b

class IndexedQueue h a =>
      MeldableIndexedQueue h a where
    merge
        :: h n a -> h m a -> h (n + m) a

You’ll need MultiParamTypeClasses for this one.

mergeB
    :: Ord a
    => Binomial z xs a -> Binomial z ys a -> Binomial z (xs + ys) a
mergeB Nil ys              = ys
mergeB xs Nil              = xs
mergeB (Skip xs) (Skip ys) = Skip (mergeB xs ys)
mergeB (Skip xs) (y :- ys) = y :- mergeB xs ys
mergeB (x :- xs) (Skip ys) = x :- mergeB xs ys
mergeB (x :- xs) (y :- ys) = Skip (mergeCarry (mergeTree x y) xs ys)

mergeCarry
    :: Ord a
    => Tree z a
    -> Binomial z xs a
    -> Binomial z ys a
    -> Binomial z (1 + xs + ys) a
mergeCarry !t Nil ys              = carryOne t ys
mergeCarry !t xs Nil              = carryOne t xs
mergeCarry !t (Skip xs) (Skip ys) = t :- mergeB xs ys
mergeCarry !t (Skip xs) (y :- ys) = Skip (mergeCarry (mergeTree t y) xs ys)
mergeCarry !t (x :- xs) (Skip ys) = Skip (mergeCarry (mergeTree t x) xs ys)
mergeCarry !t (x :- xs) (y :- ys) = t :- mergeCarry (mergeTree x y) xs ys

carryOne :: Ord a => Tree z a -> Binomial z xs a -> Binomial z (1 + xs) a
carryOne !t Nil       = t :- Nil
carryOne !t (Skip xs) = t :- xs
carryOne !t (x :- xs) = Skip (carryOne (mergeTree t x) xs)

instance Ord a => MeldableIndexedQueue (Binomial 0) a where
    merge = mergeB
    {-# INLINE merge #-}

instance Ord a => IndexedQueue (Binomial 0) a where
    empty = Nil
    singleton x = Root x NilN :- Nil
    insert = merge . singleton

(BangPatterns for this example)

On top of that, it’s very easy to define delete-min:

    minView xs = case minViewZip xs of
      Zipper x _ ys -> (x, ys)
    minViewMay q b f = case q of
      Nil -> b
      _ :- _ -> uncurry f (minView q)
      Skip _ -> uncurry f (minView q)

data Zipper a n rk = Zipper !a (Node rk a) (Binomial rk n a)

skip :: Binomial (1 + z) xs a -> Binomial z (xs + xs) a
skip x = case x of
  Nil    -> Nil
  Skip _ -> Skip x
  _ :- _ -> Skip x

data MinViewZipper a n rk where
    Infty :: MinViewZipper a 0 rk
    Min :: {-# UNPACK #-} !(Zipper a n rk) -> MinViewZipper a (n+1) rk

slideLeft :: Zipper a n (1 + rk) -> Zipper a (1 + n + n) rk
slideLeft (Zipper m (t :< ts) hs)
  = Zipper m ts (t :- hs)

pushLeft 
  :: Ord a 
  => Tree rk a 
  -> Zipper a n (1 + rk) 
  -> Zipper a (2 + n + n) rk
pushLeft c (Zipper m (t :< ts) hs)
  = Zipper m ts (Skip (carryOne (mergeTree c t) hs))

minViewZip :: Ord a => Binomial rk (1 + n) a -> Zipper a n rk
minViewZip (Skip xs) = slideLeft (minViewZip xs)
minViewZip (t@(Root x ts) :- f) = case minViewZipMay f of
  Min ex@(Zipper minKey _ _) | minKey < x -> pushLeft t ex
  _                          -> Zipper x ts (skip f)

minViewZipMay :: Ord a => Binomial rk n a -> MinViewZipper a n rk
minViewZipMay (Skip xs) = Min (slideLeft (minViewZip xs))
minViewZipMay Nil = Infty
minViewZipMay (t@(Root x ts) :- f) = Min $ case minViewZipMay f of
  Min ex@(Zipper minKey _ _) | minKey < x -> pushLeft t ex
  _                          -> Zipper x ts (skip f)

Similarly, compare the version of the pairing heap with the plugin:

data Heap n a where
  E :: Heap 0 a
  T :: a -> HVec n a -> Heap (1 + n) a

data HVec n a where
  HNil :: HVec 0 a
  HCons :: Heap m a -> HVec n a -> HVec (m + n) a

insert :: Ord a => a -> Heap n a -> Heap (1 + n) a
insert x xs = merge (T x HNil) xs

merge :: Ord a => Heap m a -> Heap n a -> Heap (m + n) a
merge E ys = ys
merge xs E = xs
merge h1@(T x xs) h2@(T y ys)
  | x <= y = T x (HCons h2 xs)
  | otherwise = T y (HCons h1 ys)

minView :: Ord a => Heap (1 + n) a -> (a, Heap n a)
minView (T x hs) = (x, mergePairs hs)

mergePairs :: Ord a => HVec n a -> Heap n a
mergePairs HNil = E
mergePairs (HCons h HNil) = h
mergePairs (HCons h1 (HCons h2 hs)) =
    merge (merge h1 h2) (mergePairs hs)

To the version without the plugin:

data Heap n a where
  E :: Heap Z a
  T :: a -> HVec n a -> Heap (S n) a

data HVec n a where
  HNil :: HVec Z a
  HCons :: Heap m a -> HVec n a -> HVec (m + n) a

class Sized h where
  size :: h n a -> The Peano n

instance Sized Heap where
  size E = Zy
  size (T _ xs) = Sy (size xs)

plus :: The Peano n -> The Peano m -> The Peano (n + m)
plus Zy m = m
plus (Sy n) m = Sy (plus n m)

instance Sized HVec where
  size HNil = Zy
  size (HCons h hs) = size h `plus` size hs

insert :: Ord a => a -> Heap n a -> Heap (S n) a
insert x xs = merge (T x HNil) xs

merge :: Ord a => Heap m a -> Heap n a -> Heap (m + n) a
merge E ys = ys
merge xs E = case plusZero (size xs) of Refl -> xs
merge h1@(T x xs) h2@(T y ys)
  | x <= y = case plusCommutative (size h2) (size xs) of
                    Refl -> T x (HCons h2 xs)
  | otherwise = case plusSuccDistrib (size xs) (size ys) of
                    Refl -> T y (HCons h1 ys)

minView :: Ord a => Heap (S n) a -> (a, Heap n a)
minView (T x hs) = (x, mergePairs hs)

mergePairs :: Ord a => HVec n a -> Heap n a
mergePairs HNil = E
mergePairs (HCons h HNil) = case plusZero (size h) of Refl -> h
mergePairs (HCons h1 (HCons h2 hs)) =
  case plusAssoc (size h1) (size h2) (size hs) of
    Refl -> merge (merge h1 h2) (mergePairs hs)

Leftist Heaps

The typechecker plugin makes it relatively easy to implement several other heaps: skew, Braun, etc. You’ll need one extra trick to implement a leftist heap, though. Let’s take a look at the unverified version:

data Leftist a
    = Leaf
    | Node {-# UNPACK #-} !Int
           a
           (Leftist a)
           (Leftist a)

rank :: Leftist s -> Int
rank Leaf          = 0
rank (Node r _ _ _) = r
{-# INLINE rank #-}

mergeL :: Ord a => Leftist a -> Leftist a -> Leftist a
mergeL Leaf h2 = h2
mergeL h1 Leaf = h1
mergeL h1@(Node w1 p1 l1 r1) h2@(Node w2 p2 l2 r2)
  | p1 < p2 =
      if ll <= lr
          then LNode (w1 + w2) p1 l1 (mergeL r1 h2)
          else LNode (w1 + w2) p1 (mergeL r1 h2) l1
  | otherwise =
      if rl <= rr
          then LNode (w1 + w2) p2 l2 (mergeL r2 h1)
          else LNode (w1 + w2) p2 (mergeL r2 h1) l2
  where
    ll = rank r1 + w2
    lr = rank l1
    rl = rank r2 + w1
    rr = rank l2

In a weight-biased leftist heap, the left branch in any tree must have at least as many elements as the right branch. Ideally, we would encode that in the representation of size-indexed leftist heap:

data Leftist n a where
        Leaf :: Leftist 0 a
        Node :: !(The Nat (n + m + 1))
             -> a
             -> Leftist n a
             -> Leftist m a
             -> !(m <= n)
             -> Leftist (n + m + 1) a

rank :: Leftist n s -> The Nat n
rank Leaf             = sing
rank (Node r _ _ _ _) = r
{-# INLINE rank #-}

Two problems, though: first of all, we need to be able to compare the sizes of two heaps, in the merge function. If we were using the type-level Peano numbers, this would be too slow. More importantly, though, we need the comparison to provide a proof of the ordering, so that we can use it in the resulting Node constructor.

Integer-Backed Type-Level Numbers

In Agda, the Peano type is actually backed by Haskell’s Integer at runtime. This allows compile-time proofs to be written about values which are calculated efficiently. We can mimic the same thing in Haskell with a newtype wrapper around Integer with a phantom Peano parameter, if we promise to never put an integer in which has a different value to its phantom value. We can make this promise a little more trustworthy if we don’t export the newtype constructor.

newtype instance The Nat n where
        NatSing :: Integer -> The Nat n

instance KnownNat n => KnownSing n where
    sing = NatSing $ Prelude.fromInteger $ natVal (Proxy :: Proxy n)

FlexibleInstances is needed for the instance. We can also encode all the necessary arithmetic:

infixl 6 +.
(+.) :: The Nat n -> The Nat m -> The Nat (n + m)
(+.) =
    (coerce :: (Integer -> Integer -> Integer) 
            -> The Nat n -> The Nat m -> The Nat (n + m))
        (+)
{-# INLINE (+.) #-}

Finally, the compare function (ScopedTypeVariables for this):

infix 4 <=.
(<=.) :: The Nat n -> The Nat m -> The Bool (n <=? m)
(<=.) (NatSing x :: The Nat n) (NatSing y :: The Nat m)
  | x <= y = 
      case (unsafeCoerce (Refl :: True :~: True) :: (n <=? m) :~: True) of
        Refl -> Truey
  | otherwise = 
      case (unsafeCoerce (Refl :: True :~: True) :: (n <=? m) :~: False) of
        Refl -> Falsy
{-# INLINE (<=.) #-}

totalOrder ::  p n -> q m -> (n <=? m) :~: False -> (m <=? n) :~: True
totalOrder (_ :: p n) (_ :: q m) Refl = 
    unsafeCoerce Refl :: (m <=? n) :~: True

type x <= y = (x <=? y) :~: True

It’s worth mentioning that all of these functions are somewhat axiomatic: there’s no checking of these definitions going on, and any later proofs are only correct in terms of these functions.

If we want our merge function to really look like the non-verified version, though, we’ll have to mess around with the syntax a little.

A Dependent if-then-else

When matching on a singleton, within the case-match, proof of the singleton’s type is provided. For instance:

type family IfThenElse (c :: Bool) (true :: k) (false :: k) :: k
     where
        IfThenElse True true false = true
        IfThenElse False true false = false

intOrString :: The Bool cond -> IfThenElse cond Int String
intOrString Truey = 1
intOrString Falsy = "abc"

In Haskell, since we can overload the if-then-else construct (with RebindableSyntax), we can provide the same syntax, while hiding the dependent nature:

ifThenElse :: The Bool c -> (c :~: True -> a) -> (c :~: False -> a) -> a
ifThenElse Truey t _ = t Refl
ifThenElse Falsy _ f = f Refl

Verified Merge

Finally, then, we can write the implementation for merge, which looks almost exactly the same as the non-verified merge:

instance Ord a => IndexedQueue Leftist a where

    minView (Node _ x l r _) = (x, merge l r)
    {-# INLINE minView #-}


    singleton x = Node sing x Leaf Leaf Refl
    {-# INLINE singleton #-}

    empty = Leaf
    {-# INLINE empty #-}

    insert = merge . singleton
    {-# INLINE insert #-}

    minViewMay Leaf b _             = b
    minViewMay (Node _ x l r _) _ f = f x (merge l r)

instance Ord a =>
         MeldableIndexedQueue Leftist a where
    merge Leaf h2 = h2
    merge h1 Leaf = h1
    merge h1@(Node w1 p1 l1 r1 _) h2@(Node w2 p2 l2 r2 _)
      | p1 < p2 =
          if ll <=. lr
             then Node (w1 +. w2) p1 l1 (merge r1 h2)
             else Node (w1 +. w2) p1 (merge r1 h2) l1 . totalOrder ll lr
      | otherwise =
          if rl <=. rr
              then Node (w1 +. w2) p2 l2 (merge r2 h1)
              else Node (w1 +. w2) p2 (merge r2 h1) l2 . totalOrder rl rr
      where
        ll = rank r1 +. w2
        lr = rank l1
        rl = rank r2 +. w1
        rr = rank l2
    {-# INLINE merge #-}

What’s cool about this implementation is that it has the same performance as the non-verified version (if Integer is swapped out for Int, that is), and it looks pretty much the same. This is very close to static verification for free.

Generalizing Sort to Parts

The Sort type used in the original blog post can be generalized to any indexed container.

data Parts f g a b r where
    Parts :: (forall n. g (m + n) b -> (g n b, r))
         -> !(f m a)
         -> Parts f g a b r

instance Functor (Parts f g a b) where
  fmap f (Parts g h) =
    Parts (\h' -> case g h' of (remn, r) -> (remn, f r)) h
  {-# INLINE fmap #-}

instance (IndexedQueue f x, MeldableIndexedQueue f x) =>
          Applicative (Parts f g x y) where
    pure x = Parts (\h -> (h, x)) empty
    {-# INLINE pure #-}

    (Parts f (xs :: f m x) :: Parts f g x y (a -> b)) <*> 
      Parts g (ys :: f n x) =
        Parts h (merge xs ys)
        where
          h :: forall o . g ((m + n) + o) y -> (g o y, b)
          h v = case f v of { (v', a) ->
                    case g v' of { (v'', b) ->
                      (v'', a b)}}
    {-# INLINABLE (<*>) #-}

This version doesn’t insist that you order the elements of the heap in any particular way: we could use indexed difference lists to reverse a container, or indexed lists to calculate permutations of a container, for instance.

Other Uses For Size-Indexed Heaps

I’d be very interested to see any other uses of these indexed heaps, if anyone has any ideas. Potentially the could be used in any place where there is a need for some heap which is known to be of a certain size (a true prime sieve, for instance).

The Library

I’ve explored all of these ideas here. It has implementations of all the heaps I mentioned, as well as the index-erasing type, and a size-indexed list, for reversing traversables. In the future, I might add things like a Fibonacci heap, or the optimal Brodal/Okasaki heap (Brodal and Okasaki 1996).

Brodal, Gerth Stølting, and Chris Okasaki. 1996. “Optimal Purely Functional Priority Queues.” Journal of Functional Programming 6 (6) (November): 839–857. doi:10.1017/S095679680000201X. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.973.

Diatchki, Iavor S. 2015. “Improving Haskell Types with SMT.” In Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell, 1–10. Haskell ’15. New York, NY, USA: ACM. doi:10.1145/2804302.2804307. http://yav.github.io/publications/improving-smt-types.pdf.

Eisenberg, Richard A., and Stephanie Weirich. 2012. “Dependently Typed Programming with Singletons.” In Proceedings of the 2012 Haskell Symposium, 117–130. Haskell ’12. New York, NY, USA: ACM. doi:10.1145/2364506.2364522. http://cs.brynmawr.edu/~rae/papers/2012/singletons/paper.pdf.

Fredman, Michael L., Robert Sedgewick, Daniel D. Sleator, and Robert E. Tarjan. 1986. “The pairing heap: A new form of self-adjusting heap.” Algorithmica 1 (1-4) (January): 111–129. doi:10.1007/BF01840439. http://www.cs.princeton.edu/courses/archive/fall09/cos521/Handouts/pairingheaps.pdf.

Hinze, Ralf. 1999. “Functional Pearls: Explaining Binomial Heaps.” Journal of Functional Programming 9 (1) (January): 93–104. doi:10.1017/S0956796899003317. http://www.cs.ox.ac.uk/ralf.hinze/publications/#J1.

———. 2001. “Manufacturing datatypes.” Journal of Functional Programming 11 (5) (September): 493–524. doi:10.1017/S095679680100404X. http://www.cs.ox.ac.uk/ralf.hinze/publications/#J6.

Wasserman, Louis. 2010. “Playing with Priority Queues.” The Monad.Reader 16 (16) (May): 37. https://themonadreader.files.wordpress.com/2010/05/issue16.pdf.

Unparsing

Donnacha Oisín Kidney — Sat, 01 Apr 2017 00:00:00 UT

Posted on April 1, 2017

Tags: Haskell

Pretty-printing expressions with minimal parenthesis is a little trickier than it looks. This algorithm is adapted from:

Ramsey, Norman. ‘Unparsing Expressions With Prefix and Postfix Operators’. Software—Practice & Experience 28, no. 12 (1998): 1327–1356.

{-# LANGUAGE DeriveFunctor #-}

module Unparse where

import Data.Semigroup
import Data.Coerce

data Side
    = L
    | R
    deriving Eq

data ShowExpr t e
    = Lit     {repr :: t}
    | Prefix  {repr :: t, op :: Op, child :: e}
    | Postfix {repr :: t, op :: Op, child :: e}
    | Binary  {repr :: t, op :: Op, lchild :: e, rchild :: e}
    deriving Functor

data Op = Op
    { prec :: Int
    , assoc :: Side
    }

showExpr
    :: Semigroup t
    => (e -> ShowExpr t e) -> (t -> t) -> e -> t
showExpr proj prns = go . fmap proj . proj
  where
    go (Lit t) = t
    go (Prefix t o y) = t <> ifPrns R o (getOp y) (go (fmap proj y))
    go (Postfix t o x) = ifPrns L o (getOp x) (go (fmap proj x)) <> t
    go (Binary t o x y) =
        ifPrns L o (getOp x) (go (fmap proj x)) <> t <>
        ifPrns R o (getOp y) (go (fmap proj y))
    ifPrns sid (Op op oa) (Just (Op ip ia))
      | ip < op || ip == op && (ia /= oa || sid /= oa) = prns
    ifPrns _ _ _ = id
    getOp Lit{} = Nothing
    getOp e = Just (op e)

showSExpr :: (e -> ShowExpr ShowS e) -> e -> ShowS
showSExpr proj =
    appEndo .
    showExpr
        (coerce proj)
        (coerce (showParen True))

And an example of its use:

data Expr = Number Integer
          | Expr :+: Expr
          | Expr :*: Expr
          | Expr :^: Expr

instance Num Expr where
  (+) = (:+:)
  (*) = (:*:)
  fromInteger = Number
  abs = undefined
  signum = undefined
  negate = undefined

-- | >>> default (Expr)
--
-- >>> 1 + 2 + 3
-- 1 + 2 + 3
--
-- >>> 1 * 2 * 3
-- 1 * 2 * 3
--
-- >>> (1 * 2) + 3
-- 1 * 2 + 3
--
-- >>> 1 * (2 + 3)
-- 1 * (2 + 3)
--
-- >>> (1 :^: 2) :^: 3
-- (1 ^ 2) ^ 3
--
-- >>> 1 :^: (2 :^: 3)
-- 1 ^ 2 ^ 3
instance Show Expr where
  showsPrec _ = showSExpr proj where
    proj (Number n) = Lit (shows n)
    proj (x :+: y) = Binary (showString " + ") (Op 3 L) x y
    proj (x :*: y) = Binary (showString " * ") (Op 4 L) x y
    proj (x :^: y) = Binary (showString " ^ ") (Op 5 R) x y

Fun with Recursion Schemes

Donnacha Oisín Kidney — Thu, 30 Mar 2017 00:00:00 UT

Posted on March 30, 2017

Tags: Haskell, Recursion Schemes

Folding Algebras

I saw this post on reddit recently, and it got me thinking about recursion schemes. One of the primary motivations behind them is the reduction of boilerplate. The classic example is evaluation of arithmetic expressions:

data ExprF a
  = LitF Integer
  | (:+:) a a
  | (:*:) a a
  deriving Functor

type Expr = Fix ExprF

eval :: Expr -> Integer
eval = unfix >>> \case
  LitF n -> n
  x :+: y -> eval x + eval y
  x :*: y -> eval x * eval y

The calls to eval are the boilerplate: this is where the main recursion scheme, cata can help.

evalF :: Expr -> Integer
evalF = cata $ \case
  LitF n -> n
  x :+: y -> x + y
  x :*: y -> x * y

I still feel like there’s boilerplate, though. Ideally I’d like to write this:

evalF :: Expr -> Integer
evalF = cata $ ??? $ \case
  Lit -> id
  Add -> (+)
  Mul -> (*)

The ??? needs to be filled in. It’s a little tricky, though: the type of the algebra changes depending on what expression it’s given. GADTs will allow us to attach types to cases:

data ExprI a r f where
  Lit :: ExprI a b (Integer -> b)
  Add :: ExprI a b (a -> a -> b)
  Mul :: ExprI a b (a -> a -> b)

The first type parameter is the same as the first type parameter to ExprF. The second is the output type of the algebra, and the third is the type of the fold required to produce that output type. The third type parameter depends on the case matched in the GADT. Using this, we can write a function which converts a fold/pattern match to a standard algebra:

foldAlg :: (forall f. ExprI a r f -> f) -> (ExprF a -> r)
foldAlg f (LitF i)  = f Lit i
foldAlg f (x :+: y) = f Add x y
foldAlg f (x :*: y) = f Mul x y

And finally, we can write the nice evaluation algebra:

evalF :: Expr -> Integer
evalF = cata $ foldAlg $ \case
  Lit -> id
  Add -> (+)
  Mul -> (*)

I hacked together some quick template Haskell to generate the matchers over here. It uses a class AsPatternFold:

class AsPatternFold x f | x -> f where
  foldMatch :: (forall a. f r a -> a) -> (x -> r)

And you generate the extra data type, with an instance, by doing this:

makePatternFolds ''ExprF

The code it generates can be used like this:

evalF :: Expr -> Integer
evalF = cata $ foldMatch $ \case
  LitI -> id
  (:+|) -> (+)
  (:*|) -> (*)

It’s terribly hacky at the moment, I may clean it up later.

Record Case

There’s another approach to the same idea that is slightly more sensible, using record wildcards. You define a handler for you datatype (an algebra):

data ExprAlg a r
  = ExprAlg
  { litF :: Integer -> r
  , (+:) :: a -> a -> r
  , (*:) :: a -> a -> r }

Then, to use it, you define how to interact between the handler and the datatype, like before. The benefit is that record wildcard syntax allows you to piggy back on the function definition syntax, like so:

data ExprF a
  = LitF Integer
  | (:+:) a a
  | (:*:) a a

makeHandler ''ExprF

exprAlg :: ExprF Integer -> Integer
exprAlg = index ExprFAlg {..} where
  litF = id
  (+:) = (+)
  (*:) = (*)

This approach is much more principled: the index function, for example, comes from the adjunctions package, from the Representable class. That’s because those algebras are actually representable functors, with their representation being the thing they match. They also conform to a whole bunch of things automatically, letting you combine them interesting ways.

Printing Expressions

Properly printing expressions, with minimal parentheses, is a surprisingly difficult problem. Ramsey (1998) provides a solution of the form:

isParens side (Assoc ao po) (Assoc ai pi) =
  pi <= po && (pi /= po || ai /= ao || ao /= side)

Using this, we can write an algebra for printing expressions. It should work in the general case, not just on the expression type defined above, so we need to make another unfixed functor to describe the printing of an expression:

data Side = L | R deriving Eq

data ShowExpr t e
  = ShowLit { _repr :: t }
  | Prefix  { _repr :: t, _assoc :: (Int,Side), _child  :: e }
  | Postfix { _repr :: t, _assoc :: (Int,Side), _child  :: e }
  | Binary  { _repr :: t, _assoc :: (Int,Side), _lchild :: e
                                              , _rchild :: e }
  deriving Functor
  
makeLenses ''ShowExpr

The lenses are probably overkill. For printing, we need not only the precedence of the current level, but also the precedence one level below. Seems like the perfect case for a zygomorphism:

showExprAlg :: Semigroup t
            => (t -> t)
            -> ShowExpr t (Maybe (Int,Side), t)
            -> t
showExprAlg prns = \case 
    ShowLit t               ->                   t
    Prefix  t s       (q,y) ->                   t <> ifPrns R s q y
    Postfix t s (p,x)       -> ifPrns L s p x <> t
    Binary  t s (p,x) (q,y) -> ifPrns L s p x <> t <> ifPrns R s q y
  where
    ifPrns sid (op,oa) (Just (ip,ia))
      | ip < op || ip == op && (ia /= oa || sid /= oa) = prns
    ifPrns _ _ _ = id

The first argument to this algebra is the parenthesizing function. This algebra works fine for when the ShowExpr type is already constructed:

showExpr' :: Semigroup t => (t -> t) -> Fix (ShowExpr t) -> t
showExpr' = zygo (preview assoc) . showExprAlg

But we still need to construct the ShowExpr from something else first. hylo might be a good fit:

hylo :: Functor f => (f b -> b) -> (a -> f a) -> a -> b

But that performs a catamorphism after an anamorphism, and we want a zygomorphism after an anamorphism. Luckily, the recursion-schemes library is constructed in such a way that different schemes can be stuck together relatively easily:

hylozygo
    :: Functor f
    => (f a -> a) -> (f (a, b) -> b) -> (c -> f c) -> c -> b
hylozygo x y z = ghylo (distZygo x) distAna y (fmap Identity . z)

showExpr :: Semigroup t
         => (t -> t)
         -> (e -> ShowExpr t e)
         -> e -> t
showExpr = hylozygo (preview assoc) . showExprAlg

Let’s try it out, with a right-associative operator this time to make things more difficult:

data ExprF a
  = LitF Integer
  | (:+:) a a
  | (:*:) a a
  | (:^:) a a
  deriving Functor

makeHandler ''ExprF

newtype Expr = Expr { runExpr :: ExprF Expr }

instance Num Expr where
  fromInteger = Expr . LitF
  x + y = Expr (x :+: y)
  x * y = Expr (x :*: y)
  
infixr 8 ^*
(^*) :: Expr -> Expr -> Expr
x ^* y = Expr (x :^: y)

instance Show Expr where
  show =
    showExpr
      (\x -> "(" ++ x ++ ")")
      (index ExprFAlg {..} . runExpr)
    where
      litF = ShowLit . show
      (+:) = Binary " + " (6,L)
      (*:) = Binary " * " (7,L)
      (^:) = Binary " ^ " (8,R)

Since we only specified Semigroup in the definition of showExpr, we can use the more efficient difference-list definition of Show:

instance Show Expr where
    showsPrec _ =
      appEndo . showExpr
        (Endo . showParen True . appEndo)
        (index ExprFAlg {..} . runExpr)
      where
        litF = ShowLit . Endo . shows
        (+:) = Binary (Endo (" + " ++)) (6,L)
        (*:) = Binary (Endo (" * " ++)) (7,L)
        (^:) = Binary (Endo (" ^ " ++)) (8,R)

1 ^* 2 ^* 3         -- 1 ^ 2 ^ 3
(1 ^* 2) ^* 3       -- (1 ^ 2) ^ 3
1 * 2 + 3   :: Expr -- 1 * 2 + 3
1 * (2 + 3) :: Expr -- 1 * (2 + 3)

Ramsey, Norman. 1998. “Unparsing Expressions With Prefix and Postfix Operators.” Software—Practice & Experience 28 (12): 1327–1356. http://www.cs.tufts.edu/%7Enr/pubs/unparse-abstract.html.

Constrained Applicatives

Donnacha Oisín Kidney — Wed, 08 Mar 2017 00:00:00 UT

Posted on March 8, 2017

Tags: Haskell, Applicative

In Haskell restricted monads are monads which can’t contain every type. Set is a good example. If you look in the documentation for Data.Set you’ll see several functions which correspond to functions in the Functor/Applicative/Monad typeclass hierarchy:

map :: Ord b => (a -> b) -> Set a -> Set b
singleton :: a -> Set a
foldMap :: Ord b => (a -> Set b) -> Set a -> Set b -- specialized

Unfortunately, though, Set can’t conform to Functor, because the signature of fmap looks like this:

fmap :: Functor f => (a -> b) -> f a -> f b

It doesn’t have an Ord constraint.

This is annoying: when using Set, lots of things have to be imported qualified, and you have to remember the slightly different names of extra functions like map. More importantly, you’ve lost the ability to write generic code over Functor or Monad which will work on Set.

There are a number of ways to get around this problem. Here, an approach using reflection-reification is explored. These are the types involved:

newtype SetC a = 
       SetC{unSetC :: forall r. Ord r => (a -> Set r) -> Set r}

reifySet :: Ord r => SetC r -> Set r
reifySet m = unSetC m singleton

reflectSet :: Ord r => Set r -> SetC r
reflectSet s = SetC $ \k -> S.foldr (\x r -> k x `union` r) S.empty s

SetC is just Cont in disguise. In fact, we can generalize this pattern, using Constraint Kinds:

newtype FreeT c m a = 
       FreeT { runFreeT :: forall r. c r => (a -> m r) -> m r}

reifySet :: Ord a => FreeT Ord Set a -> Set a
reifySet m = runFreeT m singleton

reflectSet :: Set r -> FreeT Ord Set r
reflectSet s = FreeT $ \k -> S.foldr (\x r -> k x `union` r) S.empty s

FreeT looks an awful lot like ContT by now. The type has some other interesting applications, though. For instance, this type:

type FM = FreeT Monoid Identity

Is the free monoid. If we use a transformers-style type synonym, the naming becomes even nicer:

type Free c = FreeT c Identity

runFree :: c r => Free c a -> (a -> r) -> r
runFree xs f = runIdentity (runFreeT xs (pure . f))

instance Foldable (Free Monoid) where
  foldMap = flip runFree

Check out this package for an implementation of the non-transformer Free.

Different Classes

This is still unsatisfying, though. Putting annotations around your code feels inelegant. The next solution is to replace the monad class altogether with our own, and turn on -XRebindableSyntax. There are a few ways to design this new class. One option is to use multi-parameter type classes. Another solution is with an associated type:

class Functor f where
  type Suitable f a :: Constraint
  fmap :: Suitable f b => (a -> b) -> f a -> f b

This is similar to the approach taken in the rmonad library, except that library doesn’t use constraint kinds (they weren’t available when the library was made), so it has to make do with a Suitable class. Also, the signature for fmap in rmonad is:

fmap :: (Suitable f a, Suitable f b) => (a -> b) -> f a -> f b

I don’t want to constrain a: I figure if you can get something into your monad, it must be suitable. And I really want to reduce the syntactic overhead of writing extra types next to your functions.

There’s also the supermonad library out there which is much more general than any of these examples: it supports indexed monads as well as constrained.

Anyway,Monad is defined similarly to Functor:

class Functor m => Monad m where
  return :: Suitable m a => a -> m a
  (>>=) :: Suitable m b => m a -> (a -> m b) -> m b

Again, I want to minimize the use of Suitable, so for >>= there’s only a constraint on b.

Finally, here’s the Set instance:

instance Functor Set where
    type Suitable Set a = Ord a
    fmap = Set.map

Monomorphic

With equality constraints, you can actually make monomorphic containers conform to these classes (or, at least, wrappers around them).

import qualified Data.Text as Text

data Text a where
  Text :: Text.Text -> Text Char

instance Functor Text where
  type Suitable Text a = a ~ Char
  fmap f (Text xs) = Text (Text.map f xs)

This pattern can be generalized with some more GADT magic:

data Monomorphic xs a b where
        Monomorphic :: (a ~ b) => xs -> Monomorphic xs a b

instance (MonoFunctor xs, a ~ Element xs) => Functor (Monomorphic xs a) where
  type Suitable (Monomorphic xs a) b = a ~ b
  fmap f (Monomorphic xs) = Monomorphic (omap f xs)

Where omap comes from the mono-traversable package. You could go a little further, to Foldable:

instance (MonoFoldable xs, element ~ Element xs) =>
         Foldable (Monomorphic xs element) where
    foldr f b (Monomorphic xs) = ofoldr f b xs
    foldMap f (Monomorphic xs) = ofoldMap f xs
    foldl' f b (Monomorphic xs) = ofoldl' f b xs
    toList (Monomorphic xs) = otoList xs
    null (Monomorphic xs) = onull xs
    length (Monomorphic xs) = olength xs
    foldr1 f (Monomorphic xs) = ofoldr1Ex f xs
    elem x (Monomorphic xs) = oelem x xs
    maximum (Monomorphic xs) = maximumEx xs
    minimum (Monomorphic xs) = minimumEx xs
    sum (Monomorphic xs) = osum xs
    product (Monomorphic xs) = oproduct xs

Back to normal

Changing the FreeT type above a little, we can go back to normal functors and monads, and write more general reify and reflect functions:

newtype FreeT m a = 
       FreeT { runFreeT :: forall r. Suitable m r => (a -> m r) -> m r}
       
reify :: (Monad m, Suitable m a) => FreeT m a -> m a
reify = flip runFreeT return

reflect :: Monad m => m a -> FreeT m a
reflect x = FreeT (x >>=)

So now our types, when wrapped, can conform to the Prelude’s Functor. It would be nice if this type could be written like so:

reify :: Monad m => FreeT (Suitable m) m a -> m a
reify = flip runFreeT return

reflect :: Monad m => m a -> FreeT (Suitable m) m a
reflect x = FreeT (x >>=)

But unfortunately type families cannot be partially applied.

Applicatives

The classes above aren’t very modern: they’re missing applicative. This one is tricky:

class Functor f => Applicative f where
  pure :: Suitable a => a -> f a
  (<*>) :: Suitable f b => f (a -> b) -> f a -> f b

The issue is f (a -> b). There’s no way you’re getting some type like that into Set. This means that <*> is effectively useless. No problem, you think: define liftA2 instead:

class Functor f => Applicative f where
  pure :: Suitable a => a -> f a
  liftA2 :: Suitable f c => (a -> b -> c) -> f a -> f b -> f c

(<*>) :: (Applicative f, Suitable f b) => f (a -> b) -> f a -> f b
(<*>) = liftA2 ($)

Great! Now we can use it with set. However, there’s no way (that I can see) to define the other lift functions: liftA3, etc. Of course, if >>= is available, it’s as simple as:

liftA3 f xs ys zs = do
  x <- xs
  y <- ys
  z <- zs
  pure (f x y z)

But now we can’t define it for non-monadic applicatives (square matrices, ZipLists, etc.). This also forces us to use >>= when <*> may have been more efficient.

The functions we’re interested in defining look like this:

liftA2 :: Suitable f c => (a -> b -> c) -> f a -> f b -> f c
liftA3 :: Suitable f d => (a -> b -> c -> d) -> f a -> f b -> f c -> f d
liftA4 :: Suitable f e => (a -> b -> c -> d -> e) -> f a -> f b -> f c -> f d -> f e

There’s a clear pattern, but no obvious way to abstract over it. Type-level shenanigans to the rescue!

The pattern might be expressed like this:

liftA :: Func args -> Func lifted args

We can store these types as heterogeneous lists:

infixr 5 :-
data Vect xs where
  Nil  :: Vect '[]
  (:-) :: x -> Vect xs -> Vect (x ': xs)

infixr 5 :*
data AppVect f xs where
  NilA :: AppVect f '[]
  (:*) :: f x -> AppVect f xs -> AppVect f (x ': xs)

And liftA can be represented like this:

liftA
    :: Suitable f b
    => (Vect xs -> b) -> AppVect f xs -> f b

liftA2
    :: Suitable f c
    => (a -> b -> c) -> f a -> f b -> f c
liftA2 f xs ys =
    liftA
        (\(x :- y :- Nil) ->
              f x y)
        (xs :* ys :* NilA)

liftA3
    :: Suitable f d
    => (a -> b -> c -> d) -> f a -> f b -> f c -> f d
liftA3 f xs ys zs =
    liftA
        (\(x :- y :- z :- Nil) ->
              f x y z)
        (xs :* ys :* zs :* NilA)

Cool! For unrestricted applicatives, we can define liftA in terms of <*>:

liftAP :: (Prelude.Applicative f) 
       => (Vect xs -> b) -> (AppVect f xs -> f b)
liftAP f NilA = Prelude.pure (f Nil)
liftAP f (x :* NilA) 
  = Prelude.fmap (f . (:-Nil)) x
liftAP f (x :* xs) 
  =  ((f .) . (:-)) Prelude.<$> x Prelude.<*> liftAP id xs

And for types with a monad instance, we can define it in terms of >>=:

liftAM :: (Monad f, Suitable f b) 
       => (Vect xs -> b) -> (AppVect f xs -> f b)
liftAM f NilA = pure (f Nil)
liftAM f (x :* NilA) = fmap (f . (:-Nil)) x
liftAM f (x :* xs) = x >>= \y -> liftAM (f . (y:-)) xs

Efficiency

This approach is really slow. Every function wraps up its arguments in a Vect, and it’s just generally awful.

What about not wrapping up the function? Type families can help here:

type family FunType (xs :: [*]) (y :: *) :: * where
  FunType '[] y = y
  FunType (x ': xs) y = x -> FunType xs y

It gets really difficult to define liftA using <*> now, though. liftAM, on the other hand, is a breeze:

liftAM :: Monad f => FunType xs a -> AppVect f xs -> f a
liftAM f Nil = pure f
liftAM f (x :< xs) = x >>= \y -> liftAM (f y) xs

And no vector constructors on the right of the bind!

Still, no decent definition using <*>. The problem is that we’re using a cons-list to represent a function’s arguments, but <*> is left-associative, so it builds up arguments as a snoc list. Lets try using a snoc-list as the type family:

infixl 5 :>
data AppVect f xs where
  Nil :: AppVect f '[]
  (:>) :: AppVect f xs -> f x -> AppVect f (x ': xs)

type family FunType (xs :: [*]) (y :: *) :: * where
  FunType '[] y = y
  FunType (x ': xs) y = FunType xs (x -> y)

liftA
    :: Suitable f a
    => FunType xs a -> AppVect f xs -> f a

liftAP now gets a natural definition:

liftAP :: Prelude.Applicative f => FunType xs a -> AppVect f xs -> f a
liftAP f Nil = Prelude.pure f
liftAP f (Nil :> xs) = Prelude.fmap f xs
liftAP f (ys :> xs) = liftAP f ys Prelude.<*> xs

But what about liftAM? It’s much more difficult, fundamentally because >>= builds up arguments as a cons-list. To convert between the two efficiently, we need to use the trick for reversing lists efficiently: build up the reversed list as you go.

liftAM :: (Monad f, Suitable f a) => FunType xs a -> AppVect f xs -> f a
liftAM = go pure where
  go :: (Suitable f b, Monad f) 
     => (a -> f b) -> FunType xs a -> AppVect f xs -> f b
  go f g Nil = f g
  go f g (xs :> x) = go (\c -> x >>= f . c) g xs

Using these definitions, we can make Set, Text, and all the rest of them applicatives, while preserving the applicative operations. Also, from my preliminary testing, there seems to be no overhead in using these new definitions for <*>.

Normalized Embedding

In Sculthorpe et al. (2013), there’s discussion of this type:

data NM :: (* -> Constraint) -> (* -> *) -> * -> * where
  Return :: a -> NM c t a
  Bind :: c x => t x -> (x -> NM c t a) -> NM c t a

This type allows constrained monads to become normal monads. It can be used for the same purpose as the FreeT type from above. In the paper, the free type is called RCodT.

One way to look at the type is as a concrete representation of the monad class, with each method being a constructor.

You might wonder if there are similar constructs for functor and applicative. Functor is simple:

data NF :: (* -> Constraint) -> (* -> *) -> * -> * where
  FMap :: c x => (x -> a) -> t x -> NF c t a

Again, this can conform to functor (and only functor), and can be interpreted when the final type is Suitable.

Like above, it has a continuation version, Yoneda.

For applicatives, though, the situation is different. In the paper, they weren’t able to define a transformer for applicatives that could be interpreted in some restricted applicative. I needed one because I wanted to use -XApplicativeDo notation: the desugaring uses <*>, not the liftAn functions, so I wanted to construct a free applicative using <*>, and run it using the lift functions. What I managed to cobble to gether doesn’t really solve the problem, but it works for -XApplicativeDo!

The key with a lot of this was realizing that <*> is snoc, not cons. Using a free applicative:

data Free f a where
  Pure :: a -> Free f a
  Ap :: Free f (a -> b) -> f a -> Free f b

instance Prelude.Functor (Free f) where
  fmap f (Pure a) = Pure (f a)
  fmap f (Ap x y) = Ap ((f .) Prelude.<$> x) y

instance Prelude.Applicative (Free f) where
  pure = Pure
  Pure f <*> y = Prelude.fmap f y
  Ap x y <*> z = Ap (flip Prelude.<$> x Prelude.<*> z) y

This type can conform to Applicative and Functor no problem. And all it needs to turn back into a constrained applicative is for the outer type to be suitable:

lift :: f a -> Free f a
lift = Ap (Pure id)

lower
    :: forall f a c.
       Free f a
    -> (forall xs. FunType xs a -> AppVect f xs -> f c)
    -> f c
lower (Pure x) f = f x Nil
lower (Ap fs x :: Free f a) f =
    lower fs (\ft av -> f ft (av :> x))

lowerConstrained
    :: (Constrained.Applicative f, Suitable f a)
    => Free f a -> f a
lowerConstrained x = lower x liftA

There’s probably a more efficient way to encode it, though.

Sculthorpe, Neil, Jan Bracker, George Giorgidze, and Andy Gill. 2013. “The Constrained-monad Problem.” In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, 287–298. ICFP ’13. New York, NY, USA: ACM. doi:10.1145/2500365.2500602. http://ku-fpg.github.io/files/Sculthorpe-13-ConstrainedMonad.pdf.

Semirings

Donnacha Oisín Kidney — Thu, 17 Nov 2016 00:00:00 UT

Posted on November 17, 2016

Tags: Haskell, Semirings

{-# LANGUAGE GeneralizedNewtypeDeriving, TypeFamilies #-}
{-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable #-}
{-# LANGUAGE PatternSynonyms, ViewPatterns, LambdaCase #-}
{-# LANGUAGE RankNTypes, FlexibleInstances, FlexibleContexts #-}
{-# LANGUAGE OverloadedStrings, OverloadedLists, MonadComprehensions #-}

module Semirings where

import qualified Data.Map.Strict as Map
import           Data.Map.Strict      (Map)
import           Data.Monoid  hiding  (Endo(..))
import           Data.Foldable hiding (toList)
import           Control.Applicative
import           Control.Arrow        (first)
import           Control.Monad.Cont
import           Data.Functor.Identity
import           GHC.Exts
import           Data.List hiding     (insert)
import           Data.Maybe           (mapMaybe)

I’ve been playing around a lot with semirings recently. A semiring is anything with addition, multiplication, zero and one. You can represent that in Haskell as:

class Semiring a where
  zero :: a
  one  :: a
  infixl 7 <.>
  (<.>) :: a -> a -> a
  infixl 6 <+>
  (<+>) :: a -> a -> a

It’s kind of like a combination of two monoids. It has the normal monoid laws:

x <+> (y <+> z) = (x <+> y) <+> z
x <.> (y <.> z) = (x <.> y) <.> z
x <+> zero = zero <+> x = x
x <.> one  = one  <.> x = x

And a few extra:

x <+> y = y <+> x
x <.> (y <+> z) = (x <.> y) <+> (x <.> z)
(x <+> y) <.> z = (x <.> z) <+> (y <.> z)
zero <.> a = a <.> zero = zero

I should note that what I’m calling a semiring here is often called a rig. I actually prefer the name “rig”: a rig is a ring without negatives (cute!); whereas a semiring is a rig without neutral elements, which mirrors the definition of a semigroup. The nomenclature in this area is a bit of a mess, though, so I went with the more commonly-used name for the sake of googleability.

At first glance, it looks quite numeric. Indeed, PureScript uses it as the basis for its numeric hierarchy. (In my experience so far, it’s nicer to use than Haskell’s Num)

instance Semiring Integer where
  zero = 0
  one  = 1
  (<+>) = (+)
  (<.>) = (*)

instance Semiring Double where
  zero = 0
  one  = 1
  (<+>) = (+)
  (<.>) = (*)

However, there are far more types which can form a valid Semiring instance than can form a valid Num instance: the negate method, for example, excludes types representing the natural numbers:

newtype ChurchNat = ChurchNat 
  { runNat :: forall a. (a -> a) -> a -> a}
 
data Nat = Zero | Succ Nat

These form perfectly sensible semirings, though:

instance Semiring ChurchNat where
  zero = ChurchNat (const id)
  one = ChurchNat ($)
  ChurchNat n <+> ChurchNat m = ChurchNat (\f -> n f . m f)
  ChurchNat n <.> ChurchNat m = ChurchNat (n . m)

instance Semiring Nat where
  zero = Zero
  one = Succ Zero
  Zero <+> x = x
  Succ x <+> y = Succ (x <+> y)
  Zero <.> _ = Zero
  Succ Zero <.> x =x
  Succ x <.> y = y <+> (x <.> y)

The other missing method is fromInteger, which means decidedly non-numeric types are allowed:

instance Semiring Bool where
  zero = False
  one  = True
  (<+>) = (||)
  (<.>) = (&&)

We can provide a more general definition of the Sum and Product newtypes from Data.Monoid:

newtype Add a = Add
  { getAdd :: a
  } deriving (Eq, Ord, Read, Show, Semiring)

newtype Mul a = Mul
  { getMul :: a
  } deriving (Eq, Ord, Read, Show, Semiring)

instance Functor Add where
  fmap f (Add x) = Add (f x)

instance Applicative Add where
  pure = Add
  Add f <*> Add x = Add (f x)

I’m using Add and Mul here to avoid name clashing.

instance Semiring a => Monoid (Add a) where
  mempty = Add zero
  Add x `mappend` Add y = Add (x <+> y)

instance Semiring a => Monoid (Mul a) where
  mempty = Mul one
  Mul x `mappend` Mul y = Mul (x <.> y)
  
add :: (Semiring a, Foldable f) => f a -> a
add = getAdd . foldMap Add

mul :: (Semiring a, Foldable f) => f a -> a
mul = getMul . foldMap Mul

add and mul are equivalent to sum and product:

add xs == sum (xs :: [Integer])

mul xs == product (xs :: [Integer])

But they now work with a wider array of types: non-negative numbers, as we’ve seen, but specialised to Bool we get the familiar Any and All newtypes (and their corresponding folds).

add xs == or (xs :: [Bool])

mul xs == and (xs :: [Bool])

So far, nothing amazing. We avoid a little bit of code duplication, that’s all.

A Semiring Map

In older versions of Python, there was no native set type. In its place, dictionaries were used, where the values would be booleans. In a similar fashion, before the Counter type was added in 2.7, the traditional way of representing a multiset was using a dictionary where the values were integers.

Using semirings, both of these data structures can have the same type:

newtype GeneralMap a b = GeneralMap
  { getMap :: Map a b
  } deriving (Functor, Foldable, Show, Eq, Ord)

If operations are defined in terms of the Semiring class, the same code will work on a set and a multiset:

insert :: (Ord a, Semiring b) => a -> GeneralMap a b -> GeneralMap a b
insert x = GeneralMap . Map.insertWith (<+>) x one . getMap

delete :: Ord a => a -> GeneralMap a b -> GeneralMap a b
delete x = GeneralMap . Map.delete x . getMap

How to get back the dictionary-like behaviour, then? Well, operations like lookup and assoc are better suited to a Monoid constraint, rather than Semiring:

lookup :: (Ord a, Monoid b) => a -> GeneralMap a b -> b
lookup x = fold . Map.lookup x . getMap

assoc :: (Ord a, Applicative f, Monoid (f b)) 
      => a -> b -> GeneralMap a (f b) -> GeneralMap a (f b)
assoc k v = GeneralMap . Map.insertWith mappend k (pure v) . getMap

lookup is a function which should work on sets and multisets: however Bool and Integer don’t have Monoid instances. To fix this, we can use the Add newtype from earlier. The interface for each of these data structures can now be expressed like this:

type Set      a   = GeneralMap a (Add Bool)
type MultiSet a   = GeneralMap a (Add Integer)
type Map      a b = GeneralMap a (First b)
type MultiMap a b = GeneralMap a [b]

And each of the functions on the GeneralMap specialises like this:

-- Set
insert :: Ord a => a -> Set a -> Set a
lookup :: Ord a => a -> Set a -> Add Bool
delete :: Ord a => a -> Set a -> Set a

-- MultiSet
insert :: Ord a => a -> MultiSet a -> MultiSet a
lookup :: Ord a => a -> MultiSet a -> Add Integer
delete :: Ord a => a -> MultiSet a -> MultiSet a

-- Map
assoc  :: Ord a => a -> b -> Map a b -> Map a b
lookup :: Ord a => a -> Map a b -> First b
delete :: Ord a => a -> Map a b -> Map a b

-- MultiMap
assoc  :: Ord a => a -> b -> MultiMap a b -> MultiMap a b
lookup :: Ord a => a -> MultiMap a b -> [b]
delete :: Ord a => a -> MultiMap a b -> MultiMap a b

This was actually where I first came across semirings: I was trying to avoid code duplication for a trie implementation. I wanted to get the Boom Hierarchy (1981) (plus maps) from the same underlying implementation.

It works okay. On the one hand, it’s nice that you don’t have to wrap the map type itself to get the different behaviour. There’s only one delete function, which works on sets, maps, multisets, etc. I don’t need to import the TrieSet module qualified, to differentiate between the four delete functions I’ve written.

On the other hand, the Add wrapper is a pain: having lookup return the wrapped values is ugly, and the Applicative constraint is unwieldy (we only use it for pure). Both of those problems could be solved by using something like the Newtype or Wrapped class, which provide facilities for wrapping and unwrapping, but that might be overkill.

While Monoid and Semiring can take you pretty far, even to a Monoid instance:

fromList :: (Ord a, Semiring b, Foldable f) => f a -> GeneralMap a b
fromList = foldr insert (GeneralMap Map.empty)

fromAssocs :: (Ord a, Applicative f, Monoid (f b), Foldable t) 
           => t (a, b) -> GeneralMap a (f b)
fromAssocs = foldr (uncurry assoc) (GeneralMap Map.empty)

instance (Ord a, Monoid b) => Monoid (GeneralMap a b) where
  mempty = GeneralMap Map.empty
  mappend (GeneralMap x) (GeneralMap y) = 
    GeneralMap (Map.unionWith mappend x y)

singleton :: Semiring b => a -> GeneralMap a b
singleton x = GeneralMap (Map.singleton x one)

They seem to fall down around functions like intersection:

intersection :: (Ord a, Semiring b)
             => GeneralMap a b -> GeneralMap a b -> GeneralMap a b
intersection (GeneralMap x) (GeneralMap y) =
  GeneralMap (Map.intersectionWith (<.>) x y)

It works for sets, but it doesn’t make sense for multisets, and it doesn’t work for maps.

I couldn’t find a semiring for the map-like types which would give me a sensible intersection. I’m probably after a different algebraic structure.

A Probability Semiring

While looking for a semiring to represent a valid intersection, I came across the probability semiring. It’s just the normal semiring over the rationals, with a lower bound of 0, and an upper of 1.

It’s useful in some cool ways: you can combine it with a list to get the probability monad (Erwig and Kollmansberger 2006). There’s an example in PureScript’s Distributions package.

newtype Prob s a = Prob { runProb :: [(a,s)] }

There are some drawbacks to this representation, performance-wise. In particular, there’s a combinatorial explosion on every monadic bind. One of the strategies to reduce this explosion is to use a map:

newtype Prob s a = Prob { runProb :: Map a s }

Because this doesn’t allow duplicate keys, it will flatten the association list on every bind. Unfortunately, the performance gain doesn’t always materialize, and in some cases there’s a performance loss (Larsen 2011). Also, the Ord constraint on the keys prevents it from conforming to Monad (at least not without difficulty).

Interestingly, this type is exactly the same as the GeneralMap from before. This is a theme I kept running into, actually: the GeneralMap type represents not just maps, multimaps, sets, multisets, but also a whole host of other data structures.

Cont

Edward Kmett had an interesting blog post about “Free Modules and Functional Linear Functionals” (2011b). In it, he talked about this type:

infixr 0 $*
newtype Linear r a = Linear { ($*) :: (a -> r) -> r }

Also known as Cont, the continuation monad. It can encode the probability monad:

fromProbs :: (Semiring s, Applicative m) => [(a,s)] -> ContT s m a
fromProbs xs = ContT $ \k ->
  foldr (\(x,s) a -> liftA2 (<+>) (fmap (s<.>) (k x)) a) (pure zero) xs

probOfT :: (Semiring r, Applicative m) => (a -> Bool) -> ContT r m a -> m r
probOfT e c = runContT c (\x -> if e x then pure one else pure zero)

probOf :: Semiring r => (a -> Bool) -> Cont r a -> r
probOf e = runIdentity . probOfT e

uniform :: Applicative m => [a] -> ContT Double m a
uniform xs =
  let s = 1.0 / fromIntegral (length xs)
  in fromProbs (map (flip (,) s) xs)

Multiplication isn’t paid for on every bind, making this (potentially) a more efficient implementation than both the map and the association list.

You can actually make the whole thing a semiring:

instance (Semiring r, Applicative m) => Semiring (ContT r m a) where
  one  = ContT (const (pure one))
  zero = ContT (const (pure zero))
  f <+> g = ContT (\k -> liftA2 (<+>) (runContT f k) (runContT g k))
  f <.> g = ContT (\k -> liftA2 (<.>) (runContT f k) (runContT g k))

Which gives you a lovely Alternative instance:

instance (Semiring r, Applicative m) => Alternative (ContT r m) where
  (<|>) = (<+>)
  empty = zero

This sheds some light on what was going on with the unsatisfactory intersection function on GeneralMap: it’s actually multiplication. If you wanted to stretch the analogy and make GeneralMap conform to Semiring, you could use the empty map for zero, mappend for <+>, but you’d run into trouble for one. one is the map where every possible key has a value of one. In other words, you’d have to enumerate over every possible value for the keys. Interestingly, there’s kind of the inverse problem for Cont: while it has an easy Semiring instance, in order to inspect the values you have to enumerate over all the possible keys.

I now have a name for the probability monad / general map / Cont thing: a covector.

I think that the transformer version of Cont has a valid interpretation, also. If I ever understand Hirschowitz and Maggesi (2010) I’ll put it into a later follow-up post.

Conditional choice

As a short digression, you can beef up the <|> operator a little, with something like the conditional choice operator:

data BiWeighted s = s :|: s
infixl 8 :|:

(|>) :: (Applicative m, Semiring s)
     => BiWeighted s
     -> ContT s m a
     -> ContT s m a
     -> ContT s m a
((lp :|: rp) |> r) l =
  (mapContT.fmap.(<.>)) lp l <|> (mapContT.fmap.(<.>)) rp r
--
(<|) :: ContT s m a
     -> (ContT s m a -> ContT s m a)
     -> ContT s m a
l <| r = r l

infixr 0 <|
infixr 0 |>

probOf ('a'==) (uniform "a" <| 0.4 :|: 0.6 |> uniform "b")
0.4

UnLeak

If you fiddle around with the probability monad, you can break it apart in interesting ways. For instance, extracting the WriterT monad transformer gives you:

WriterT (Product Double) []

Eric Kidd describes it as PerhapsT: a Maybe with attached probability in his excellent blog post (and his paper in 2007).

Straight away, we can optimise this representation by transforming the leaky WriterT into a state monad:

newtype WeightedT s m a = WeightedT 
  { getWeightedT :: s -> m (a, s)
  } deriving Functor
  
instance Monad m => Applicative (WeightedT s m) where
  pure x = WeightedT $ \s -> pure (x,s)
  WeightedT fs <*> WeightedT xs = WeightedT $ \s -> do
    (f, p) <- fs s
    (x, t) <- xs p
    pure (f x, t)
  
instance Monad m => Monad (WeightedT s m) where
  WeightedT x >>= f = WeightedT $ \s -> do
    (x, p) <- x s
    getWeightedT (f x) p

I’m not sure yet, but I think this might have something to do with the isomorphism between Cont ((->) s) and State s (Kmett 2011a).

You can even make it look like a normal (non-transformer) writer with some pattern synonyms:

type Weighted s = WeightedT s Identity

pattern Weighted w <- (runIdentity . flip getWeightedT zero -> w) where
  Weighted (x,w) = WeightedT (\s -> Identity (x, s <.> w) )

And you can pretend that you’ve just got a normal tuple:

half :: a -> Weighted Double a
half x = Weighted (x, 0.5)

runWeighted :: Semiring s => Weighted s a -> (a, s)
runWeighted (Weighted w) = w

evalWeighted :: Semiring s => Weighted s a -> a
evalWeighted (Weighted (x,_)) = x

execWeighted :: Semiring s => Weighted s a -> s
execWeighted (Weighted (_,s)) = s

Free

Looking back at Cont, it is reminiscent of a particular encoding of the free monoid from Doel (2015):

newtype FreeMonoid a = FreeMonoid
  { forall m. Monoid m => (a -> m) -> m }

So possibly covectors represent the free semiring, in some way.

Another encoding which looks free-ish is one of the efficient implementations of the probability monad from Larsen (2011):

data Dist a where
  Certainly :: a -> Dist a -- only possible value
  Choice :: Probability -> Dist a -> Dist a -> Dist a
  Fmap :: (a -> b) -> Dist a -> Dist b
  Join :: Dist (Dist a) -> Dist a

This looks an awful lot like a weighted free alternative. Is it a free semiring, then?

Maybe. There’s a parallel between the relationship between monoids and semirings and applicatives and Alternatives (Rivas, Jaskelioff, and Schrijvers 2015). In a way, where monads are monoids in the category of endofunctors, alternatives are semirings in the category of endofunctors.

This parallel probably isn’t what I first thought it was. First of all, the above paper uses near-semirings, not semirings. A near-semiring is a semiring where the requirements for left distribution of multiplication over addition and commutative addition are dropped. Secondly, the class which most mirrors near-semirings is MonadPlus, not alternative. (alternative doesn’t have annihilation) Thirdly, right distribution of multiplication over addition isn’t required MonadPlus: it’s a further law required on top of the existing laws. Fourthly, most types in the Haskell ecosystem today which conform to MonadPlus don’t conform to this extra law: in fact, those that do seem to be lists of some kind or another.

A further class is probably needed on top of the two already there, with the extra laws (called Nondet in Fischer 2009).

An actual free near-semiring looks like this:

data Free f x = Free { unFree :: [FFree f x] }
data FFree f x = Pure x | Con (f (Free f x))

Specialised to the Identity monad, that becomes:

data Forest a = Forest { unForest :: [Tree x] }
data Tree x = Leaf x | Branch (Forest x)

De-specialised to the free monad transformer, it becomes:

newtype FreeT f m a = FreeT
  { runFreeT :: m (FreeF f a (FreeT f m a)) }

data FreeF f a b
  = Pure a
  | Free (f b)

type FreeNearSemiring f = FreeT f []

These definitions all lend themselves to combinatorial search (Spivey 2009; Fischer 2009; Piponi 2009), with one extra operation needed: wrap.

Odds

Does the odds monad fit in to any of this?

While WriterT (Product Rational) [] is a valid definition of the traditional probability monad, it’s not the same as the odds monad. If you take the odds monad, and parameterize it over the weight of the tail, you get this:

data Odds m a = Certain a | Choice (m (a, Odds a))

Which looks remarkably like ListT done right:

newtype ListT m a = ListT { next :: m (Step m a) }
data Step m a = Cons a (ListT m a) | Nil

That suggests a relationship between probability and odds:

WriterT (Product  Rational) [] = Probability
ListT   (Weighted Rational)    = Odds

ListT isn’t a perfect match, though: it allows empty lists. To correct this, you could use the Cofree Comonad:

data Cofree f a = a :< (f (Cofree f a))

Subbing in Maybe for f, you get a non-empty list. A weighted Maybe is basically PerhapsT, as was mentioned earlier.

Generalizing Semirings

Types in haskell also form a semiring.

(<.>) = (,)
one = ()

(<+>) = Either
zero = Void

There’s a subset of semirings which are star semirings. They have an operation $*$ such that:

$a* = 1 + aa* = 1 + a*a$

Or, as a class:

class Semiring a => StarSemiring a where
  star :: a -> a
  star x = one <+> plus x
  plus :: a -> a
  plus x = x <.> star x

Using this on types, you get:

star a = Either () (a, star a)

Which is just a standard list! Some pseudo-haskell on alternatives will give you:

star :: (Alternative f, Monoid a) => f a -> f a
star x = (x <.> star x) <+> pure mempty where
  (<.>) = liftA2 mappend
  (<+>) = <|>

Also known as many. (although note that this breaks all the laws)

The $*$ for rationals is defined as (Droste and Kuich 2009, p8):

$a* = \begin{cases} \frac{1}{1 - a} & \quad \text{if } & 0 \leq a \lt 1, \\ \infty & \quad \text{if } & a \geq 1. \end{cases}$

So, combining the probability with the type-level business, the star of Writer s a is:

Either (1, a) (a, s / (1 - s), star (Writer s a))

Or, to put it another way: the odds monad!

Endo

An endomorphism is a morphism from an object to itself. A less general definition (and the one most often used in Haskell) is a function of the type a -> a:

newtype Endo a = Endo { appEndo :: a -> a }

It forms a monoid under composition:

instance Monoid (Endo a) where
  mempty = Endo id
  mappend (Endo f) (Endo g) = Endo (f . g)

If the underlying type is itself a commutative monoid, it also forms near-semiring:

instance Monoid a => Semiring (Endo a) where
  Endo f <+> Endo g = Endo (\x -> f x <> g x)
  zero = Endo (const mempty)
  one = Endo id
  Endo f <.> Endo g = Endo (f . g)
  
instance (Monoid a, Eq a) => StarSemiring (Endo a) where
  star (Endo f) = Endo converge where
    converge x = x <> (if y == mempty then y else converge y) where
      y = f x

Here’s something interesting: there’s a similarity here to the semiring for church numerals. In fact, as far as I can tell, the functions are exactly the same when applied to endomorphisms of endomorphisms. To the extent that you could define church numerals with something as simple as this:

type ChurchEndoNat = forall a. Endo (Endo a)

And it works!

two, three :: ChurchEndoNat
two = one <+> one
three = one <+> two

unChurch :: Num a => ChurchEndoNat -> a
unChurch f = appEndo (appEndo f (Endo (1+))) 0

unChurch (two <.> three)
6

Regex

One of the most important applications (and a source of much of the notation) is regular expressions. In fact, the free semiring looks like a haskell datatype for regular expressions:

data FreeStar a
 = Gen a
 | Zer
 | One
 | FreeStar a :<+> FreeStar a
 | FreeStar a :<.> FreeStar a
 | Star (FreeStar a)

instance Semiring (FreeStar a) where
  (<+>) = (:<+>)
  (<.>) = (:<.>)
  zero = Zer
  one = One
  
instance StarSemiring (FreeStar a) where
  star = Star
  
interpret :: StarSemiring s => (a -> s) -> FreeStar a -> s
interpret f = \case
  Gen x -> f x
  Zer -> zero
  One -> one
  l :<+> r -> interpret f l <+> interpret f r
  l :<.> r -> interpret f l <.> interpret f r
  Star x -> star (interpret f x)

Then, interpreting the regex is as simple as writing an interpreter (with some help from Endo):

asRegex :: Eq a => FreeStar (a -> Bool) -> [a] -> Bool
asRegex fs = any null . appEndo (interpret f fs) . pure where
  f p = Endo . mapMaybe $ \case
    (x:xs) | p x -> Just xs
    _ -> Nothing

char' :: Eq a => a -> FreeStar (a -> Bool)
char' c = Gen (c==)

Actually, you don’t need the free version at all!

runRegex :: Eq a => Endo [[a]] -> [a] -> Bool
runRegex fs = any null . appEndo fs . pure

char :: Eq a => a -> Endo [[a]]
char c = Endo . mapMaybe $ \case
  (x:xs) | c == x -> Just xs
  _ -> Nothing

With some -XOverloadedStrings magic, you get a pretty nice interface:

instance IsString (Endo [String]) where
  fromString = mul . map char . reverse
  
(<^>) :: Semiring s => s -> s -> s
(<^>) = flip (<.>)

greet :: Endo [String]
greet = "H" <^> ("a" <+> "e") <^> "llo"

:set -XOverloadedStrings

runRegex greet "Hello"
True

runRegex greet "Hallo"
True

runRegex greet "Halo"
False

Efficiency

Of course, that’s about as slow as it gets when it comes to regexes. A faster representation is a nondeterministic finite automaton. One such implementation in haskell is Gabriel Gonzalez’s.

The regex type in that example can be immediately made to conform to Semiring and StarSemiring. However, it might be more interesting to translate the implementation into using semirings. The type of a regex looks like this:

type State = Int

{ _startingStates         :: Set State
, _transitionFunction     :: Char -> State -> Set State
, _acceptingStates        :: Set State }

The set data structure jumps out as an opportunity to sub in arbitrary semirings.Swapping in the GeneralMap is reasonably easy:

type State = Int

data Regex i s = Regex
  { _numberOfStates     :: Int 
  , _startingStates     :: GeneralMap State s
  , _transitionFunction :: i -> State -> GeneralMap State s
  , _acceptingStates    :: GeneralMap State s }

isEnd :: Semiring s => Regex i s -> s
isEnd (Regex _ as _ bs) = add (intersection as bs)

match :: Regex Char (Add Bool) -> String -> Bool
match r = getAdd . isEnd . foldl' run r where
  run (Regex n (GeneralMap as) f bs) i = Regex n as' f bs
    where as' = mconcat [ fmap (v<.>) (f i k)  | (k,v) <- Map.assocs as ]


satisfy :: Semiring s => (i -> s) -> Regex i (Add s)
satisfy predicate = Regex 2 as f bs
  where
    as = singleton 0
    bs = singleton 1

    f i 0 = assoc 1 (predicate i) mempty
    f _ _ = mempty

once :: Eq i => i -> Regex i (Add Bool)
once x = satisfy (== x)

shift :: Int -> GeneralMap State s -> GeneralMap State s
shift n = GeneralMap . Map.fromAscList . (map.first) (+ n) . Map.toAscList . getMap

instance (Semiring s, Monoid s) => Semiring (Regex i s) where

  one = Regex 1 (singleton 0) (\_ _ -> mempty) (singleton 0)
  zero = Regex 0 mempty (\_ _ -> mempty) mempty

  Regex nL asL fL bsL <+> Regex nR asR fR bsR = Regex n as f bs
    where
      n  = nL + nR
      as = mappend asL (shift nL asR)
      bs = mappend bsL (shift nL bsR)
      f i s | s < nL    = fL i s
            | otherwise = shift nL (fR i (s - nL))

  Regex nL asL fL bsL <.> Regex nR asR fR bsR = Regex n as f bs where

    n = nL + nR

    as = let ss = add (intersection asL bsL)
         in mappend asL (fmap (ss<.>) (shift nL asR))

    f i s =
        if s < nL
        then let ss = add (intersection r bsL)
             in mappend r (fmap (ss<.>) (shift nL asR))
        else shift nL (fR i (s - nL))
      where
        r = fL i s
    bs = shift nL bsR

instance (StarSemiring s, Monoid s) => StarSemiring (Regex i s) where
  star (Regex n as f bs) = Regex n as f' as
    where
      f' i s =
          let r = f i s
              ss = add (intersection r bs)
          in mappend r (fmap (ss<.>) as)

  plus (Regex n as f bs) = Regex n as f' bs
    where
      f' i s =
          let r = f i s
              ss = add (intersection r bs)
          in mappend r (fmap (ss<.>) as)


instance IsString (Regex Char (Add Bool)) where
  fromString = mul . map once

This begins to show some of the real power of using semirings and covectors. We have a normal regular expression implementation when we use the covector over bools. Use the probability semiring, and you’ve got probabilistic parsing.

Swap in the tropical semiring: a semiring over the reals where addition is the max function, and multiplication is addition of reals. Now you’ve got a depth-first parser.

That’s how you might swap in different interpretations. How about swapping in different implementations? Well, there might be some use to swapping in the CYK algorithm, or the Gauss-Jordan-Floyd-Warshall-McNaughton-Yamada algorithm (O’Connor 2011).

Alternatively, you can swap in the underlying data structure. Instead of a map, if you use an integer (each bit being a value, the keys being the bit position), you have a super-fast implementation (and the final implementation used in the original example). Finally, you could use a different representation of the state transfer function: a matrix.

Square Matrices

A square matrix can be understood as a map from pairs of indices to values. This lets us use it to represent the state transfer function. Take, for instance, a regular expression with three possible states. Its state transfer function might look like this:

$\text{transfer} = \begin{cases} 1 \quad & \{ 2, 3 \} \\ 2 \quad & \{ 1 \} \\ 3 \quad & \emptyset \end{cases}$

It has the type of:

State -> Set State

Where State is an integer. You can represent the set as a vector, where each position is a key, and each value is whether or not that key is present:

$\text{transfer} = \begin{cases} 1 \quad & 0 & 1 & 1 \\ 2 \quad & 1 & 0 & 0 \\ 3 \quad & 0 & 0 & 0 \end{cases}$

Then, the matrix representation is obvious:

$\text{transfer} = \left( \begin{array}{ccc} 0 & 1 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \end{array} \right)$

This is the semiring of square matrices. It is, of course, yet another covector. The “keys” are the transfers: 1 -> 2 or 2 -> 3, represented by the indices of the matrix. The “values” are whether or not that transfer is permitted.

The algorithms for the usual semiring operations on matrices like this are well-known and well-optimized. I haven’t yet benchmarked them in Haskell using the matrix libraries, so I don’t know how they compare to the other approaches. In the meantime, there’s an elegant list-based implementation in Dolan (2013):

data Matrix a = Scalar a
              | Matrix [[a]]
              
mjoin :: (Matrix a, Matrix a, Matrix a, Matrix a) -> Matrix a
mjoin (Matrix ws, Matrix xs, Matrix ys, Matrix zs) =
  Matrix ((zipWith (++) ws xs) ++ (zipWith (++) ys zs))
  
msplit :: Matrix a -> (Matrix a, Matrix a, Matrix a, Matrix a)
msplit (Matrix (row:rows)) = 
  (Matrix [[first]], Matrix [top]
  ,Matrix left,      Matrix rest )
  where
    (first:top) = row
    (left,rest) = unzip (map (\(x:xs) -> ([x],xs)) rows)
    
instance Semiring a => Semiring (Matrix a) where
  zero = Scalar zero
  one = Scalar one
  Scalar x <+> Scalar y = Scalar (x <+> y)
  Matrix x <+> Matrix y =
    Matrix (zipWith (zipWith (<+>)) x y)
  Scalar x <+> m = m <+> Scalar x
  Matrix [[x]] <+> Scalar y = Matrix [[x <+> y]]
  x <+> y = mjoin (first <+> y, top, left, rest <+> y)
    where (first, top, left, rest) = msplit x
  Scalar x <.> Scalar y = Scalar (x <.> y)
  Scalar x <.> Matrix y = Matrix ((map.map) (x<.>) y)
  Matrix x <.> Scalar y = Matrix ((map.map) (<.>y) x)
  Matrix x <.> Matrix y = 
    Matrix [ [ foldl1 (<+>) (zipWith (<.>) row col) | col <- cols ] 
           | row <- x ] where cols = transpose y

instance StarSemiring a => StarSemiring (Matrix a) where
  star (Matrix [[x]]) = Matrix [[star x]]
  star m = mjoin (first' <+> top' <.> rest' <.> left'
                 ,top' <.> rest', rest' <.> left', rest')
    where
      (first, top, left, rest) = msplit m
      first' = star first
      top' = first' <.> top
      left' = left <.> first'
      rest' = star (rest <+> left' <.> top)

Permutation parsing

A lot of the use from semirings comes from “attaching” them to other values. Attaching a semiring to effects (in the form of an applicative) can give you repetition of those effects. The excellent ReplicateEffects library explores this concept in depth.

It’s based on this type:

data Replicate a b
  = Nil
  | Cons (Maybe b) (Replicate a (a -> b))

This type can be made to conform to Semiring (and Starsemiring, etc) trivially.

In the simplest case, it has the same behaviour as replicateM. Even the more complex combinators, like atLeast, can be built on Alternative:

atLeast :: Alternative f => Int -> f a -> f [a]
atLeast m f = go (max 0 m) where
  go 0 = many f
  go n = liftA2 (:) f (go (n-1))
  
atMost :: Alternative f => Int -> f a -> f [a]
atMost m f = go (max 0 m) where
  go 0 = pure []
  go n = liftA2 (:) f (go (n-1)) <|> pure []

There are two main benefits over using the standard alternative implementation. First, you can choose greedy or lazy evaluation of the effects after the replication is built.

Secondly, the order of the effects doesn’t have to be specified. This allows you to execute permutations of the effects, in a permutation parser, for instance. The permutation is totally decoupled from the declaration of the repetition (it’s in a totally separate library, in fact: PermuteEffects). Its construction is reminiscent of the free alternative.

Having the replicate type conform to Semiring is all well and good: what I’m interested in is seeing if its implementation is another semiring-based object in disguise. I’ll revisit this in a later post.

Algebraic Search

List comprehension notation is one of my all-time favourite bits of syntactic sugar. It seems almost too declarative to have a reasonable implementation strategy. The vast majority of the time, it actually works in a sensible way. There are exceptions, though. Take a reasonable definition of a list of Pythagorean triples:

[ (x,y,z) | x <- [1..], y <- [1..], z <- [1..], x*x + y*y == z*z ]

This expression will diverge without yielding a single triple. It will search through every possible value for z before incrementing either x or y. Since there are infinite values for z, it will never find a triple. In other words, vanilla list comprehensions in Haskell perform depth-first search.

In order to express other kinds of search (either breadth-first or depth-bounded), different monads are needed. These monads are explored in Fischer (2009) and Spivey (2009).

You can actually use the exact same notation as above with arbitrary alternative monads using -XMonadComprehensions and -XOverloadedLists.

trips :: ( Alternative m
         , Monad m
         , IsList (m Integer)
         , Enum (Item (m Integer))
         , Num (Item (m Integer)))
      => m (Integer,Integer,Integer)
trips = [ (x,y,z) | x <- [1..], y <- [1..], z <- [1..], x*x + y*y == z*z ]

So then, here’s the challenge: swap in different ms via a type annotation, and prevent trips from diverging before getting any triples.

As one example, here’s some code adapted from Fischer (2009):

instance (Monoid r, Applicative m) => Monoid (ContT r m a) where
  mempty = ContT (const (pure mempty))
  mappend (ContT f) (ContT g) = ContT (\x -> liftA2 mappend (f x) (g x))
  
newtype List a = List 
  { runList :: forall m. Monoid m => Cont m a } deriving Functor

instance Foldable List where foldMap = flip (runCont.runList)
  
instance Show a => Show (List a) where show = show . foldr (:) []

instance Monoid (List a) where
  mappend (List x) (List y) = List (mappend x y)
  mempty = List mempty
  
instance Monoid a => Semiring (List a) where
  zero = mempty
  (<+>) = mappend
  (<.>) = liftA2 mappend
  one = pure mempty

bfs :: List a -> [a]
bfs = toList . fold . levels . anyOf

newtype Levels a = Levels { levels :: [List a] } deriving Functor

instance Applicative Levels where
  pure x = Levels [pure x]
  Levels fs <*> Levels xs = Levels [ f <*> x | f <- fs, x <- xs ]
  
instance Alternative Levels where
  empty = Levels []
  Levels x <|> Levels y = Levels (mempty : merge x y)

instance IsList (List a) where
  type Item (List a) = a
  fromList = anyOf
  toList = foldr (:) []
  
instance Applicative List where
  pure x = List (pure x)
  (<*>) = ap

instance Alternative List where
  empty = mempty
  (<|>) = mappend

instance Monad List where
  x >>= f = foldMap f x

anyOf :: (Alternative m, Foldable f) => f a -> m a
anyOf = getAlt . foldMap (Alt . pure)

merge :: [List a] -> [List a] -> [List a]
merge []      ys    = ys
merge xs      []    = xs
merge (x:xs) (y:ys) = mappend x y : merge xs ys

take 3 (bfs trips)
[(3,4,5),(4,3,5),(6,8,10)]

The only relevance to semirings is the merge function. The semiring over lists is the semiring over polynomials:

instance Semiring a => Semiring [a] where
  one = [one]
  zero = []
  [] <+> ys = ys
  xs <+> [] = xs
  (x:xs) <+> (y:ys) = (x <+> y) : (xs <+> ys)
  [] <.> _ = []
  _ <.> [] = []
  (x:xs) <.> (y:ys) =
    (x <.> y) : (map (x <.>) ys <+> map (<.> y) xs <+> (xs <.> ys))

The <+> is the same as the merge function. I think the <.> might be a more valid definition of the <*> function, also.

instance Applicative Levels where
  pure x = Levels [pure x]
  Levels [] <*> _ = Levels []
  _ <*> Levels [] = Levels []
  Levels (f:fs) <*> Levels (x:xs) = Levels $
    (f <*> x) : levels (Levels (fmap (f <*>) xs) 
             <|> Levels (fmap (<*> x) fs)
             <|> (Levels fs <*> Levels xs))

Conclusion

I’ve only scratched the surface of this abstraction. There are several other interesting semirings: polynomials, logs, Viterbi, Łukasiewicz, languages, multisets, bidirectional parsers, etc. Hopefully I’ll eventually be able to put this stuff into a library or something. In the meantime, I definitely will write some posts on the application to context-free parsing, bidirectional parsing (I just read Breitner (2016)) and search.

References

Boom, H. J. 1981. “Further thoughts on Abstracto.” Working Paper ELC-9, IFIP WG 2.1. http://www.kestrel.edu/home/people/meertens/publications/papers/Abstracto_reader.pdf.

Breitner, Joachim. 2016. “Showcasing Applicative.” Joachim Breitner’s Blog. http://www.joachim-breitner.de/blog/710-Showcasing_Applicative.

Doel, Dan. 2015. “Free Monoids in Haskell.” The Comonad.Reader. http://comonad.com/reader/2015/free-monoids-in-haskell/.

Droste, Manfred, and Werner Kuich. 2009. “Semirings and Formal Power Series.” In Handbook of Weighted Automata, ed by. Manfred Droste, Werner Kuich, and Heiko Vogler, 1:3–28. Monographs in Theoretical Computer Science. An EATCS Series. Berlin, Heidelberg: Springer Berlin Heidelberg. http://staff.mmcs.sfedu.ru/~ulysses/Edu/Marktoberdorf_2009/working_material/Esparsa/Kuich.%20Semirings%20and%20FPS.pdf.

Fischer, Sebastian. 2009. “Reinventing Haskell Backtracking.” In Informatik 2009, Im Fokus das Leben (ATPS’09). GI Edition. http://www-ps.informatik.uni-kiel.de/~sebf/data/pub/atps09.pdf.

Hirschowitz, André, and Marco Maggesi. 2010. “Modules over monads and initial semantics.” Information and Computation 208 (5). Special Issue: 14th Workshop on Logic, Language, Information and Computation (WoLLIC 2007) (May): 545–564. doi:10.1016/j.ic.2009.07.003. https://pdfs.semanticscholar.org/3e0c/c79e8cda9246cb954da6fd8aaaa394fecdc3.pdf.

Kidd, Eric. 2007. “Build your own probability monads.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.9502&rep=rep1&type=pdf.

Kmett, Edward. 2011a. “Free Monads for Less (Part 2 of 3): Yoneda.” The Comonad.Reader. http://comonad.com/reader/2011/free-monads-for-less-2/.

———. 2011b. “Modules and Functional Linear Functionals.” The Comonad.Reader. http://comonad.com/reader/2011/free-modules-and-functional-linear-functionals/.

Larsen, Ken Friis. 2011. “Memory Efficient Implementation of Probability Monads.” http://www.diku.dk/~kflarsen/t/ProbMonad-unpublished.pdf.

O’Connor, Russell. 2011. “A Very General Method of Computing Shortest Paths.” Russell O’Connor’s Blog. http://r6.ca/blog/20110808T035622Z.html.

Piponi, Dan. 2009. “A Monad for Combinatorial Search with Heuristics.” A Neighborhood of Infinity. http://blog.sigfpe.com/2009/07/monad-for-combinatorial-search-with.html.

Probability Trees

Donnacha Oisín Kidney — Fri, 30 Sep 2016 00:00:00 UT

Posted on September 30, 2016

Tags: Haskell, Probability

{-# language DeriveFunctor, DeriveFoldable #-}
{-# language PatternSynonyms, ViewPatterns #-}

module ProbTree where

import Data.Monoid
import qualified Data.Map.Strict as Map
import Data.Map.Strict (Map)
import Control.Arrow
import Data.Ratio
import Data.Foldable

Previously, I tried to figure out how to make the probability monad more “listy”. I read a little more about the topic (especially Erwig and Kollmansberger 2006; and Kidd 2007).

I then thought about what a probability monad would look like if it was based on other data structures. I feel like the standard version really wants to be:

newtype ProperProb a = ProperProb
  { yes :: Map a (Product Rational) }

But of course a monad instance isn’t allowed.

Similar to a map, though, is a binary tree:

data BinaryTree a = Leaf
                  | Node (BinaryTree a) a (BinaryTree a)

And it feels better for probability - flatter, somehow. Transmuting it into a probability-thing:

data Odds a = Certain a
            | Choice (Odds a) Rational (Odds a)
            deriving (Eq, Functor, Foldable, Show)

That looks good to me. A choice between two different branches feels more natural than a choice between a head and a tail.

The fold is similar to before, with an unfold for good measure:

foldOdds :: (b -> Rational -> b -> b) -> (a -> b) -> Odds a -> b
foldOdds f b = r where
  r (Certain x) = b x
  r (Choice xs p ys) = f (r xs) p (r ys)
  
unfoldOdds :: (b -> Either a (b,Rational,b)) -> b -> Odds a
unfoldOdds f = r where
  r b = case f b of
    Left a -> Certain a
    Right (x,p,y) -> Choice (r x) p (r y)
  
fi :: Bool -> a -> a -> a
fi True  t _ = t
fi False _ f = f

I changed the pattern synonym a little:


unRatio :: Num a => Rational -> (a,a)
unRatio = numerator   &&& denominator 
      >>> fromInteger *** fromInteger

pattern n :% d <- (unRatio -> (n,d))

Then, the probOf function:

probOf :: Eq a => a -> Odds a -> Rational
probOf e = foldOdds f b where
  b x = fi (e == x) 1 0
  f x (n:%d) y = (x * n + y * d) / (n + d)

This version doesn’t have the option for short-circuiting on the first value it finds.

For generating from lists, you can try to evenly divide the list among each branch.

fromListOdds :: (([b], Int) -> Integer) -> (b -> a) -> [b] -> Maybe (Odds a)
fromListOdds fr e = r where
  r [] = Nothing
  r xs = Just (unfoldOdds f (xs, length xs))
  f ([x],_) = Left (e x)
  f (xs ,n) = Right ((ys,l), fr (ys,l) % fr (zs,r), (zs,r)) where
    l = n `div` 2
    r = n - l
    (ys,zs) = splitAt l xs

equalOdds :: [a] -> Maybe (Odds a)
equalOdds = fromListOdds (fromIntegral . snd) id

fromDistrib :: [(a,Integer)] -> Maybe (Odds a)
fromDistrib = fromListOdds (sum . map snd . fst) fst

What’s really nice about this version is the fact that the old append is just the Choice constructor, leaving the instances to be really nice:

flatten :: Odds (Odds a) -> Odds a
flatten = foldOdds Choice id

instance Applicative Odds where
  pure = Certain
  fs <*> xs = flatten (fmap (<$> xs) fs)
  
instance Monad Odds where
  x >>= f = flatten (f <$> x)

Finally, as a bonus, to remove duplicates:

lcd :: Foldable f => f Rational -> Integer
lcd = foldl' (\a e -> lcm a (denominator e)) 1

toDistrib :: Odds a -> [(a,Integer)]
toDistrib = factorOut . foldOdds f b where
  b x = [(x,1)]
  f l p r = (map.fmap) (n%t*) l ++ (map.fmap) (d%t*) r where
    n = numerator p
    d = denominator p
    t = n + d
  factorOut xs = (map.fmap) (numerator . (lcd'*)) xs where
    lcd' = fromIntegral . lcd . map snd $ xs

counts :: (Ord a, Num n) => [(a,n)] -> [(a,n)]
counts = 
  Map.assocs . 
  Map.fromListWith (+)
      
compress :: Ord a => Odds a -> Odds a
compress xs = let Just ys = (fromDistrib . counts . toDistrib) xs in ys

After reading yet more on this, I found that the main issue with the monad is its performance. Two articles in particular: Larsen (2011), and Ścibior, Ghahramani, and Gordon (2015), refer to a GADT implementation of the monad which maximises laziness.

References

Kidd, Eric. 2007. “Build your own probability monads.”

Larsen, Ken Friis. 2011. “Memory Efficient Implementation of Probability Monads.”

A Different Probability Monad

Donnacha Oisín Kidney — Tue, 27 Sep 2016 00:00:00 UT

Posted on September 27, 2016

Tags: Haskell, Probability

One of the more unusual monads is the “probability monad”:

{-# language PatternSynonyms, ViewPatterns #-}
{-# language DeriveFunctor, DeriveFoldable #-}
{-# language BangPatterns #-}

module Prob where

import Control.Arrow
import Data.Ratio
import Data.Foldable

newtype Probability a = Probability
  { runProb :: [(a,Rational)] }
  
data Coin = Heads | Tails

toss :: Probability Coin
toss = Probability [(Heads, 1 % 2), (Tails, 1 % 2)]

Although it’s a little inefficient, it’s an elegant representation. I’ve written about it before here.

It has some notable deficiencies, though. For instance: the user has to constantly check that all the probabilities add up to one. Its list can be empty, which doesn’t make sense. Also, individual outcomes can appear more than once in the same list.

A first go a fixing the problem might look something like this:

newtype Distrib a = Distrib
  { runDist :: [(a,Rational)] }

tossProb :: Distrib Coin
tossProb = Distrib [(Heads, 1), (Tails, 1)]

The type is the same as before: it’s the semantics which have changed. The second field of the tuples no longer have to add up to one. The list can still be empty, though, and now finding the probability of, say, the head, looks like this:

probHead :: Distrib a -> Rational
probHead (Distrib xs@((_,p):_)) = p / sum [ q | (_,q) <- xs ]

Infinite lists aren’t possible, either.

One other way to look at the problem is to mimic the structure of cons-lists. Something like this:

data Odds a = Certainly a
            | Odds a Rational (Odds a)
            deriving (Eq, Functor, Foldable, Show)

Here, the Odds constructor (analogous to (:)) contains the betting-style odds of the head element vs. the rest of the list. The coin from before is represented by:

tossOdds :: Odds Coin
tossOdds = Odds Heads (1 % 1) (Certainly Tails)

This representation has tons of nice properties. First, let’s use some pattern-synonym magic for rationals:

pattern (:%) :: Integer -> Integer -> Rational
pattern n :% d <- (numerator &&& denominator -> (n,d)) where
  n :% d = n % d

Then, finding the probability of the head element is this:

probHeadOdds :: Odds a -> Rational
probHeadOdds (Certainly _) = 1
probHeadOdds (Odds _ (n :% d) _) = n :% (n + d)

The representation can handle infinite lists no problem:

probHeadOdds (Odds 'a' (1 :% 1) undefined)
1 % 2

Taking the tail preserves semantics, also. To do some more involved manipulation, a fold helper is handy:

foldOdds :: (a -> Rational -> b -> b) -> (a -> b) -> Odds a -> b
foldOdds f b = r where
  r (Certainly x) = b x
  r (Odds x p xs) = f x p (r xs)

You can use this function to find the probability of a given item:

probOfEvent :: Eq a => a -> Odds a -> Rational
probOfEvent e = foldOdds f b where
  b x = if e == x then 1 else 0
  f x n r = (if e == x then n else r) / (n + 1)

This assumes that each item only occurs once. A function which combines multiple events might look like this:

probOf :: (a -> Bool) -> Odds a -> Rational
probOf p = foldOdds f b where
  b x = if p x then 1 else 0
  f x n r = (if p x then r + n else r) / (n + 1)

Some utility functions to create Odds:

equalOdds :: Foldable f => f a -> Maybe (Odds a)
equalOdds xs = case length xs of
  0 -> Nothing
  n -> Just (foldr f undefined xs (n - 1)) where
    f y a 0 = Certainly y
    f y a n = Odds y (1 % fromIntegral n) (a (n - 1))

fromDistrib :: [(a,Integer)] -> Maybe (Odds a)
fromDistrib [] = Nothing
fromDistrib xs = Just $ f (tot*lst) xs where
  (tot,lst) = foldl' (\(!t,_) e -> (t+e,e)) (0,undefined) (map snd xs)
  f _ [(x,_)] = Certainly x
  f n ((x,p):xs) = Odds x (mp % np) (f np xs) where
    mp = p * lst
    np = n - mp
                  
probOfEach :: Eq a => a -> Odds a -> Rational
probOfEach x xs = probOf (x==) xs

propOf :: Eq a => a -> [a] -> Maybe Rational
propOf _ [] = Nothing
propOf x xs = Just . uncurry (%) $
  foldl' (\(!n,!m) e -> (if x == e then n+1 else n, m+1)) (0,0) xs

propOf x xs == fmap (probOfEach x) (equalOdds xs)

And finally, the instances:

append :: Odds a -> Rational -> Odds a -> Odds a
append = foldOdds f Odds where
  f e r a p ys = Odds e ip (a op ys) where
    ip = p * r / (p + r + 1)
    op = p / (r + 1)

flatten :: Odds (Odds a) -> Odds a
flatten = foldOdds append id

instance Applicative Odds where
  pure = Certainly
  fs <*> xs = flatten (fmap (<$> xs) fs)
  
instance Monad Odds where
  x >>= f = flatten (f <$> x)

Revisiting a Trie in Haskell

Donnacha Oisín Kidney — Mon, 26 Sep 2016 00:00:00 UT

Posted on September 26, 2016

Part 2 of a 2-part series on tries

Tags: Haskell, Data Structures

Conforming to Foldable

When I ended the last post, I had a nice Trie datatype, with plenty of functions, but I couldn’t get it to conform to the standard Haskell classes. The problem was to do with the type variables in the Trie:

{-# language GADTs, FlexibleInstances, TypeFamilies #-}
{-# language DeriveFoldable, DeriveFunctor, DeriveTraversable #-}
{-# language FunctionalDependencies, FlexibleInstances #-}

module Tries where

import qualified Data.Map.Strict as Map
import Data.Map.Strict (Map)
import Data.Foldable hiding (toList)
import Prelude hiding (lookup)
import Data.Monoid
import GHC.Exts (IsList(..))

data OldTrie a = OldTrie
  { otEndHere  :: Bool
  , otChildren :: Map a (OldTrie a) }

Although the type variable is a, the trie really contains lists of as. At least, that’s what’s reflected in functions like insert, member, etc.:

member :: (Foldable f, Ord a) => f a -> OldTrie a -> Bool
member = foldr f otEndHere where
  f e a = maybe False a . Map.lookup e . otChildren
  
otInsert :: (Foldable f, Ord a) => f a -> OldTrie a -> OldTrie a
otInsert = foldr f b where
  b (OldTrie _ c) = OldTrie True c
  f e a (OldTrie n c) = OldTrie n (Map.alter (Just . a . fold) e c)
  
instance Ord a => Monoid (OldTrie a) where
  mempty = OldTrie False mempty
  OldTrie v c `mappend` OldTrie t d = 
    OldTrie (v || t) (Map.unionWith (<>) c d)

Realistically, the type which the trie contains is more like:

Foldable f => Trie (f a)

That signature strongly hints at GADTs, as was indicated by this stackoverflow answer. The particular GADT which is applicable here is this:

data TrieSet a where TrieSet :: Bool -> Map a (TrieSet [a]) -> TrieSet [a]

tsEndHere :: TrieSet [a] -> Bool
tsEndHere (TrieSet e _) = e

tsChildren :: TrieSet [a] -> Map a (TrieSet [a])
tsChildren (TrieSet _ c) = c

tsInsert :: (Foldable f, Ord a) => f a -> TrieSet [a] -> TrieSet [a]
tsInsert = foldr f b where
  b :: TrieSet [a] -> TrieSet [a]
  f :: Ord a => a -> (TrieSet [a] -> TrieSet [a]) -> TrieSet [a] -> TrieSet [a]

  b (TrieSet _ c) = TrieSet True c
  f e a (TrieSet n c) = TrieSet n (Map.alter (Just . a . fold) e c)
  
instance Ord a => Monoid (TrieSet [a]) where
  mempty = TrieSet False Map.empty
  TrieSet v c `mappend` TrieSet t d = 
    TrieSet (v || t) (Map.unionWith (<>) c d)

Why lists and not a general Foldable? Well, for the particular use I had in mind (conforming to the Foldable typeclass), I need (:).

instance Foldable TrieSet where
  foldr f b (TrieSet e c) = if e then f [] r else r where
    r = Map.foldrWithKey (flip . g . (:)) b c
    g k = foldr (f . k)

With some more helper functions, the interface becomes pretty nice:

instance Show a => Show (TrieSet [a]) where
  showsPrec d t = 
    showParen 
      (d > 10)
      (showString "fromList " . shows (foldr (:) [] t))

instance Ord a => IsList (TrieSet [a]) where
  type Item (TrieSet [a]) = [a]
  fromList = foldr tsInsert mempty
  toList = foldr (:) []

The trie has the side-effect of lexicographically sorting what it’s given:

:set -XGADTs

fromList ["ced", "abc", "ced", "cb", "ab"] :: TrieSet String
fromList ["ab","abc","cb","ced"]

Further Generalizing

Most implementations of tries that I’ve seen are map-like data structures, rather than set-like. In other words, instead of holding a Bool at the value position, it holds a Maybe something.

data Trie a b = Trie
  { endHere  :: b
  , children :: Map a (Trie a b) 
  } deriving (Eq, Ord, Show, Functor, Foldable, Traversable)

This is a much more straightforward datatype. Foldable can even be automatically derived.

However, I haven’t made the endHere field a Maybe a. I want to be able to write something like this:

type TrieSet [a] = Trie a Bool
type TrieMap a b = Trie a (Maybe b)

And have it automatically choose the implementation of the functions I need¹.

To do that, though, I’ll need to write the base functions, agnostic of the type of b. I can rely on something like Monoid, though:

instance (Ord a, Monoid b) => Monoid (Trie a b) where
  mempty = Trie mempty Map.empty
  mappend (Trie v k) (Trie t l) = 
    Trie (v <> t) (Map.unionWith (<>) k l)

In fact, quite a lot of functions naturally lend themselves to this fold + monoid style:

lookup :: (Ord a, Monoid b, Foldable f) 
       => f a -> Trie a b -> b
lookup = foldr f endHere where
  f e a = foldMap a . Map.lookup e . children

insert' :: (Foldable f, Ord a, Monoid b) 
       => f a -> b -> Trie a b -> Trie a b
insert' xs v = foldr f b xs where
  b (Trie p c) = Trie (v <> p) c
  f e a (Trie n c) = 
    Trie n (Map.alter (Just . a . fold) e c)

A monoid is needed for the values, though, and neither Bool nor ∀ a. Maybe a conform to Monoid. Looking back to the implementation of the trie-set, the (||) function has been replaced by mappend. There is a newtype wrapper in Data.Monoid which has exactly this behaviour, though: Any.

Using that, the type signatures specialize to:

type TrieSet a = Trie a Any
lookup :: (Ord a, Foldable f) 
       => f a -> TrieSet a -> Any
insert :: (Ord a, Foldable f) 
       => f a -> Any -> TrieSet a -> TrieSet a

Similarly, for Maybe, there’s both First and Last. They have the behaviour:

First (Just x) <> First (Just y) == First (Just x)

Last  (Just x) <> Last  (Just y) == Last  (Just y)

I think it makes more sense for a value inserted into a map to overwrite whatever was there before. Since the newer value is on the left in the mappend, then, First makes most sense.

type TrieMap a b = Trie a (First b)
lookup :: (Ord a, Foldable f) => f a -> TrieMap a b -> First b
insert :: (Ord a, Foldable f) 
       => f a -> First b -> TrieMap a b -> TrieMap a b

There are some other ways that you can interpret the monoid. For instance, subbing in Sum Int gives you a bag-like trie:

type TrieBag a = Trie a (Sum Int)
lookup :: (Ord a, Foldable f) => f a -> TrieBag a -> Sum Int
insert :: (Ord a, Foldable f) 
       => f a -> Sum Int -> TrieBag a -> TrieBag a

This is a set which can store multiple copies of each member. Turned the other way around, a map which stores many values for each key looks like this:

type TrieBin a b = Trie a [b]
lookup :: (Ord a, Foldable f) => f a -> TrieBin a b -> [b]
insert :: (Ord a, Foldable f) 
       => f a -> [b] -> TrieBin a b -> TrieBin a b

This method so far isn’t really satisfying, though. Really, the insert signatures should look like this:

insert :: (Ord a, Foldable f) 
       => f a -> b -> TrieMap a b -> TrieMap a b
insert :: (Ord a, Foldable f)
       => f a -> b -> TrieBin a b -> TrieBin a b

Modifying insert slightly, you can get exactly that:

insert :: (Foldable f, Ord a, Applicative c, Monoid (c b)) 
       => f a -> b -> Trie a (c b) -> Trie a (c b)
insert xs v = foldr f b xs where
  b (Trie p c) = Trie (pure v <> p) c
  f e a (Trie n c) = Trie n (Map.alter (Just . a . fold) e c)

pure from Applicative is needed for the “embedding”.

Similarly, the “inserting” for the set-like types isn’t really right. The value argument is out of place. This should be the signature:

add :: (Ord a, Foldable f) 
    => f a -> TrieSet a -> TrieSet a
add :: (Ord a, Foldable f)
    => f a -> TrieBin a -> TrieBin a

In particular, while we have an “empty” thing (0, False) for monoids, we need a “one” thing (1, True) for this function. A semiring² gives this exact method:

class Monoid a => Semiring a where
  one :: a
  mul :: a -> a -> a
  
instance Num a => Semiring (Sum a) where
  one = 1
  mul = (*)

instance Semiring Any where
  one = Any True
  Any x `mul` Any y = Any (x && y)

This class is kind of like a combination of both monoid wrappers for both Int and Bool. You could take advantage of that:


class (Monoid add, Monoid mult)
  => SemiringIso a add mult | a -> add, a -> mult where
    toAdd    :: a -> add
    fromAdd  :: add -> a
    toMult   :: a -> mult
    fromMult :: mult -> a
  
(<+>), (<.>) :: SemiringIso a add mult => a -> a -> a

x <+> y = fromAdd  (toAdd  x <> toAdd  y)
x <.> y = fromMult (toMult x <> toMult y)

instance SemiringIso Int (Sum Int) (Product Int) where
  toAdd    = Sum
  fromAdd  = getSum
  toMult   = Product
  fromMult = getProduct

instance SemiringIso Bool Any All where
  toAdd    = Any
  fromAdd  = getAny
  toMult   = All
  fromMult = getAll

But it seems like overkill.

Anyway, assuming that we have the functions from Semiring, here’s the add function:

add :: (Foldable f, Ord a, Semiring b) 
    => f a -> Trie a b -> Trie a b
add xs = foldr f b xs where
  b (Trie p c) = Trie (one <> p) c
  f e a (Trie n c) = 
    Trie n (Map.alter (Just . a . fold) e c)

Now, expressions can be built up without specifying the specific monoid implementation, and the whole behaviour can be changed with a type signature:

instance (Ord a, Semiring b) => IsList (Trie a b) where
  type Item (Trie a b) = [a]
  fromList = foldr add mempty
  toList = undefined
ans :: Semiring b => b

ans = lookup "abc" (fromList ["abc", "def", "abc", "ghi"])

ans :: Sum Int
Sum {getSum = 2}

ans :: Any
Any {getAny = True}

Slightly fuller implementations of all of these are available here.

Kind of like program inference in lieu of type inference ↩︎
This isn’t really a very good definition of semiring. While Haskell doesn’t have this class in base, Purescript has it in their prelude.↩︎

Lenses are Static Selectors

Donnacha Oisín Kidney — Thu, 16 Jun 2016 00:00:00 UT

Posted on June 16, 2016

Tags: Swift

So I don’t really know what KVC is, or much about performSelector functions. This blogpost, from Brent Simmons, let me know a little bit about why I would want to use them.

It centred around removing code repetition of this type:

if localObject.foo != serverObject.foo {
  localObject.foo = serverObject.foo
}

if localObject.bar != serverObject.bar {
  localObject.bar = serverObject.bar // There was an (intentional)
}                                    // bug here in the original post

To clean up the code, Brent used selector methods. At first, I was a little uncomfortable with the solution. As far as I could tell, the basis of a lot of this machinery used functions with types like this:

func get(fromSelector: String) -> AnyObject?
func set(forSelector: String) -> ()

Which seems to be extremely dynamic. Stringly-typed and all that. Except that there are two different things going on here. One is the dynamic stuff; the ability to get rid of types when you need to. The other, though, has nothing to do with types. The other idea is being able to pass around something which can access the property (or method) of an object. Let’s look at the code that was being repeated:

if localObject.foo != serverObject.foo {
  localObject.foo = serverObject.foo
}

if localObject.bar != serverObject.bar {
  localObject.bar = serverObject.bar
}

The logical, obvious thing to do here is try refactor out the common elements. In fact, the only things that differ between the two actions above are the foo and bar. It would be great to be able to write a function like this:

func checkThenUpdate(selector) {
  if localObject.selector != serverObject.selector {
    localObject.selector = serverObject.selector
  }
}

And then maybe a single line like this:

[foo, bar, baz].forEach(checkThenUpdate)

That’s pretty obviously better. It’s just good programming: when faced with repetition, find the repeated part, and abstract it out. Is it more dynamic than the repetition, though? I don’t think so. All you have to figure out is an appropriate type for the selector, and you can keep all of your static checking. To me, it seems a lot like a lens:

struct Lens<Whole, Part> {
  let get: Whole -> Part
  let set: (Whole, Part) -> Whole
}

(This is a lens similar to the ones used in the data-lens library, in contrast to van Laarhoven lenses, or LensFamilies. LensFamilies are used in the lens package, and they allow you to change the type of the Part. They’re also just normal functions, rather than a separate type, so you can manipulate them in a pretty standard way. Swift’s type system isn’t able to model those lenses, though, unfortunately.) It has two things: a getter and a setter. The getter is pretty obvious: it takes the object, and returns the property. The setter is a little more confusing. It’s taking an object, and the new property you want to stick in to the object, and returns the object with that property updated. For instance, if we were to make a Person:

struct LocalPerson {
  var age: Int
  var name: String
}

We could then have a lens for the name field like this:

let localName: Lens<LocalPerson,String> = Lens(
  get: { p in p.name },
  set: { (oldPerson,newName) in
    var newPerson = oldPerson
    newPerson.name = newName
    return newPerson
  }
)

And you’d use it like this:

let caoimhe = LocalPerson(age: 46, name: "caoimhe")
localName.get(caoimhe) // 46
localName.set(caoimhe, "breifne") // LocalPerson(age: 46, name: "breifne")

Straight away, we’re able to do (something) like the checkThenUpdate function:

func checkThenUpdate
  <A: Equatable>
  (localLens: Lens<LocalPerson,A>, serverLens: Lens<ServerPerson,A>) {
  let serverProp = serverLens.get(serverObject)
  if localLens.get(localObject) != serverProp {
    localObject = localLens.set(localObject,serverProp)
  }
}

And it could be called pretty tersely:

checkThenUpdate(localName, serverLens: serverName)

The biggest problem with this approach, obviously, is the boilerplate. In Haskell, that’s solved with Template Haskell, so the lens code is generated for you. (I’d love to see something like that in Swift) There’s a protocol-oriented spin on lenses, also. One of the variants on lenses in Haskell are called “classy-lenses”. That’s where, instead of just generating a lens with the same name as the field it looks into, you generate a typeclass (protocol) for anything with that lens. In Swift, it might work something like this:

struct Place {
  var name: String
}

// Instead of just having a lens for the name field, have a whole protocol
// for things with a name field:

protocol HasName {
  associatedtype Name
  static var name: Lens<Self,Name> { get }
  var name: Name { get set }
}

// Because the mutable property is included in the protocol, you can rely on
// it in extensions:

extension HasName {
  static var name: Lens<Self,Name> {
    return Lens(
      get: {$0.name},
      set: { (w,p) in 
        var n = w
        n.name = p
        return n
      }
    )
  }
  var name: Name {
    get { return Self.name.get(self) }
    set { self = Self.name.set(self,newValue) }
  }
}

// This way, you can provide either the lens or the property, and you get the
// other for free.

extension Place: HasName {}

// Then, you can rely on that protocol, and all of the types:

func checkEqualOnNames
  <A,B where A: HasName, B: HasName, A.Name: Equatable, A.Name == B.Name>
  (x: A, _ y: B) -> Bool {
    return x.name == y.name
}

This protocol lets you do a kind of static respondsToSelector, with all of the types intact. Other people have spoken about the other things you can do with lenses in Swift (Brandon Williams - Lenses in Swift), like composing them together, chaining operations, etc. (One other thing they can emulate is method cascading) Unfortunately, in current Swift, the boilerplate makes all of this a little unpleasant. Still, they’re an interesting idea, and they show how a good type system needn’t always get in the way.

Folding Two Things at Once

Donnacha Oisín Kidney — Sun, 17 Apr 2016 00:00:00 UT

Posted on April 17, 2016

Tags: Haskell, Recursion Schemes

There’s a whole family of Haskell brainteasers surrounding one function: foldr. The general idea is to convert some function on lists which uses recursion into one that uses foldr. map, for instance:

map :: (a -> b) -> [a] -> [b]
map f = foldr (\e a -> f e : a) []

Some can get a little trickier. dropWhile, for instance. (See here and here for interesting articles on that one in particular.)

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p = fst . foldr f ([],[]) where
  f e ~(xs,ys) = (if p e then xs else zs, zs) where zs = e : ys

Zip

One function which was a little harder to convert than it first seemed was zip.

Here’s the first (non) solution:

zip :: [a] -> [b] -> [(a,b)]
zip = foldr f (const []) where
  f x xs (y:ys) = (x,y) : xs ys
  f _ _  [] = []

The problem with the above isn’t that it doesn’t work: it does. The problem is that it’s not really using foldr. It’s only using it on the first list: there’s still a manual uncons being performed on the second. Ideally, I would want the function to look something like this:

zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr f (\_ _ -> []) xs (foldr g (const []) ys)

The best solution I found online only dealt with Folds, not Foldables. You can read it here.

Recursive Types

Reworking the solution online for Foldables, the initial intuition is to have the foldr on the ys produce a function which takes an element of the xs, and returns a function which takes an element of the xs, and so on, finally returning the created list. The problem with that approach is the types involved:

zip :: [a] -> [b] -> [(a,b)]
zip xs = foldr f (const []) xs . foldr g (\_ _ -> []) where
  g e2 r2 e1 r1 = (e1,e2) : (r1 r2)
  f e r x = x e r

You get the error:

Occurs check: cannot construct the infinite type: t0 ~ a -> (t0 -> [(a, b)]) -> [(a, b)].

Haskell’s typechecker doesn’t allow for infinitely recursive types.

You’ll be familiar with this problem if you’ve ever tried to encode the Y-combinator, or if you’ve fiddled around with the recursion-schemes package. You might also be familiar with the solution: a newtype, encapsulating the recursion. In this case, the newtype looks very similar to the signature for foldr:

newtype RecFold a b = 
  RecFold { runRecFold :: a -> (RecFold a b -> b) -> b }

Now you can insert and remove the RecFold wrapper, helping the typechecker to understand the recursive types as it goes:

zip :: [a] -> [b] -> [(a,b)]
zip xs =
  foldr f (const []) xs . RecFold . foldr g (\_ _ -> []) where
    g e2 r2 e1 r1 = (e1,e2) : (r1 (RecFold r2))
    f e r x = runRecFold x e r

As an aside, the performance characteristics of the newtype wrapper are totally opaque to me. There may be significant improvements by using coerce from Data.Coerce, but I haven’t looked into it.

Generalised Zips

The immediate temptation from the function above is to generalise it. First to zipWith, obviously:

zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith c xs =
  foldr f (const []) xs . RecFold . foldr g (\_ _ -> []) where
    g e2 r2 e1 r1 = c e1 e2 : (r1 (RecFold r2))
    f e r x = runRecFold x e r

What’s maybe a little more interesting, though, would be a foldr on two lists. Something which folds through both at once, using a supplied combining function:

foldr2 :: (Foldable f, Foldable g)
       => (a -> b -> c -> c)
       -> c -> f a -> g b -> c
foldr2 c i xs =
  foldr f (const i) xs . RecFold . foldr g (\_ _ -> i) where
    g e2 r2 e1 r1 = c e1 e2 (r1 (RecFold r2))
    f e r x = runRecFold x e r

Of course, once you can do two, you can do three:

foldr3 :: (Foldable f, Foldable g, Foldable h)
       => (a -> b -> c -> d -> d)
       -> d -> f a -> g b -> h c -> d
foldr3 c i xs ys =
  foldr f (const i) xs . RecFold . foldr2 g (\_ _ -> i) ys where
    g e2 e3 r2 e1 r1 = c e1 e2 e3 (r1 (RecFold r2))
    f e r x = runRecFold x e r

And so on.

There’s the added benefit that the above functions work on much more than just lists.

Catamorphisms

Getting a little formal about the above functions, a fold can be described as a catamorphism. This is a name for a pattern of breaking down some recursive structure. There’s a bunch of them in the recursion-schemes package. The question is, then: can you express the above as a kind of catamorphism? Initially, using the same techniques as before, you can:

newtype RecF f a = RecF { unRecF :: Base f (RecF f a -> a) -> a }

zipo :: (Functor.Foldable f, Functor.Foldable g)
     => (Base f (RecF g c) -> Base g (RecF g c -> c) -> c)
     -> f -> g -> c
zipo alg xs ys = cata (flip unRecF) ys (cata (RecF . alg) xs)

Then, coming full circle, you get a quite nice encoding of zip:

zip :: [a] -> [b] -> [(a,b)]
zip = zipo alg where
  alg Nil _ = []
  alg _ Nil = []
  alg (Cons x xs) (Cons y ys) = (x, y) : ys xs

However, the RecF is a little ugly. In fact, it’s possible to write the above without any recursive types. (It’s possible that you could do the same with foldr2 as well, but I haven’t figured it out yet)

zipo :: (Functor.Foldable f, Functor.Foldable g)
     => (Base f (g -> c) -> Base g g -> c)
     -> f -> g -> c
zipo alg = cata (\x -> alg x . project)

And the new version of zip has a slightly more natural order of arguments:

zip :: [a] -> [b] -> [(a,b)]
zip = zipo alg where
  alg Nil _ = []
  alg _ Nil = []
  alg (Cons x xs) (Cons y ys) = (x,y) : xs ys

Zipping Into

There’s one more issue, though, that’s slightly tangential. A lot of the time, the attraction of rewriting functions using folds and catamorphisms is that the function becomes more general: it no longer is restricted to lists. For zip, however, there’s still a pesky list left in the signature:

zip :: (Foldable f, Foldable g) => f a -> g b -> [(a,b)]

It would be a little nicer to be able to zip through something preserving the structure of one of the things being zipped through. For no reason in particular, let’s assume we’ll preserve the structure of the first argument. The function will have to account for the second argument running out before the first, though. A Maybe can account for that:

zipInto :: (Foldable f, Foldable g) 
        => (a -> Maybe b -> c) 
        -> f a -> g b -> f c

If the second argument runs out, Nothing will be passed to the combining function.

It’s clear that this isn’t a fold over the first argument, it’s a traversal. A first go at the function uses the state monad, but restricts the second argument to a list:

zipInto :: Traversable f => (a -> Maybe b -> c) -> f a -> [b] -> f c
zipInto c xs ys = evalState (traverse f xs) ys where
  f x = do
    h <- gets uncons
    case h of 
      Just (y,t) -> do 
        put t
        pure (c x (Just y))
      Nothing -> pure (c x Nothing)

That code can be cleaned up a little:

zipInto :: Traversable f => (a -> Maybe b -> c) -> f a -> [b] -> f c 
zipInto c = evalState . traverse (state . f . c) where
  f x [] = (x Nothing, [])
  f x (y:ys) = (x (Just y), ys)

But really, the uncons needs to go. Another newtype wrapper is needed, and here’s the end result:

newtype RecAccu a b =
  RecAccu { runRecAccu :: a -> (RecAccu a b, b) }
  
zipInto :: (Traversable t, Foldable f)
        => (a -> Maybe b -> c) -> t a -> f b -> t c
zipInto f xs =
  snd . flip (mapAccumL runRecAccu) xs . RecAccu . foldr h i where
    i e = (RecAccu i, f e Nothing)
    h e2 a e1 = (RecAccu a, f e1 (Just e2))

2048 in Python

Donnacha Oisín Kidney — Tue, 20 Oct 2015 00:00:00 UT

Posted on October 20, 2015

Tags:

A simple implementation of the game 2048 in Python, using ncurses.

It supports different “bases” (other than 2) as well as colors, and uses a kind of Python-y functional style.

Minus comments, the whole thing is 70 lines.

#-----------------------------functional-helpers-------------------------#

from functools import reduce, partial

def compose(*funcs):

  """
  Mathematical function composition.
  compose(h, g, f)(x) => h(g(f(x)))
  """

  return reduce(lambda a,e: lambda x: a(e(x)), funcs, lambda x: x)

#-----------------------------------base---------------------------------#

# The base determines three things:
#  - The number of squares which need to be in a row to coalesce (= base)
#  - The length of the side of the board (= base^2)
#  - The number added to a random blank box on the board at the beginning
#    of every turn. (The seed) (90% of the time, the number added will be
#    the base, but 10% of the time, it will be the square of the base)
#  - The number of seeds added at evey turn (= 2^(base - 2))
#
# Normal 2048 has a base of 2.

base = int(input("Choose a base. (2 for normal 2048)\n> "))

#-----------------------------------rand---------------------------------#

def addn(board):

  """
  Inserts n seeds into random, empty positions in board. Returns board.
  n = 2^(base - 2)
  The  seed is equal to base 90% of the time. 10% of the time, it is
  equal to the square of the base.
  """

  from random import randrange, sample

  inds    = range(base**2)
  empties = [(y,x) for y in inds for x in inds if not board[y][x]]
  for y,x in sample(empties,2**(base-2)):
    board[y][x] = base if randrange(10) else base**2
  return board

#----------------------------------squish--------------------------------#

from itertools import count, groupby, starmap

def squish(row):

  """
  Returns a list, the same length as row, with the contents
  "squished" by the rules of 2048.
  Boxes are coalesced by adding their values together.
  Boxes will be coalesced iff:
   - They are adjacent, or there are only empty boxes between them.
   - The total number of boxes is equal to the base.
   - All the values of the boxes are equal.
  For base 2:
  [2][2][ ][ ] -> [4][ ][ ][ ]
  [2][2][2][2] -> [4][4][ ][ ]
  [4][ ][4][2] -> [8][2][ ][ ]
  [4][2][4][2] -> [4][2][4][2]
  For base 3:
  [3][ ][ ][3][ ][ ][3][ ][ ] -> [9][ ][ ][ ][ ][ ][ ][ ][ ]
  [3][3][3][3][3][3][3][3][3] -> [9][9][9][ ][ ][ ][ ][ ][ ]
  [3][3][3][9][9][ ][ ][ ][ ] -> [9][9][9][ ][ ][ ][ ][ ][ ]
  Keyword arguments:
  row -- A list, containing a combination of numbers and None
  (representing empty boxes)
  """

  r = []
  for n,x in starmap(lambda n, a: (n, sum(map(bool,a))),
                     groupby(filter(bool, row))):
    r += ([n*base] * (x//base)) + ([n] * (x%base))
  return r + ([None] * (base**2 - len(r)))

#----------------------------matrix-manipulation-------------------------#

# Transposes an iterable of iterables
# [[1, 2], -> [[1, 3],
#  [3, 4]]     [2, 4]]

def transpose(l): return [list(x) for x in zip(*l)]

# Flips horizontally an iterable of lists
# [[1, 2], -> [[2, 1],
#  [3, 4]]     [4, 3]]

flip = partial(map, reversed)

# transforms an iterable of iterables into a list of lists

thunk = compose(list, partial(map, list))

#----------------------------------moves---------------------------------#

# The move functions take a board as their argument, and return the board
# "squished" in a given direction.

moveLeft  = compose(thunk, partial(map, squish), thunk)
moveRight = compose(thunk, flip, moveLeft, flip)
moveUp    = compose(transpose, moveLeft, transpose)
moveDown  = compose(transpose, moveRight, transpose)

#-------------------------------curses-init------------------------------#
try:
    import curses

    screen = curses.initscr()
    curses.noecho()           # Don't print pressed keys
    curses.cbreak()           # Don't wait for enter
    screen.keypad(True)
    curses.curs_set(False)    # Hide cursor

#----------------------------------keymap--------------------------------#

    # A map from the arrow keys to the movement functions

    moves = {curses.KEY_RIGHT: moveRight,
            curses.KEY_LEFT : moveLeft ,
            curses.KEY_UP   : moveUp   ,
            curses.KEY_DOWN : moveDown }

#----------------------------------color---------------------------------#

    curses.start_color()
    curses.use_default_colors()
    curses.init_pair(1, curses.COLOR_WHITE, -1) # Border color

    def colorfac():

        """Initializes a color pair and returns it (skips black)"""

        for i,c in zip(count(2),(c for c in count(1) if c!=curses.COLOR_BLACK)):
            curses.init_pair(i, c, -1)
            yield curses.color_pair(i)

    colorgen = colorfac()

    from collections import defaultdict

    # A cache of colors, with the keys corresponding to numbers on the board.

    colors = defaultdict(lambda: next(colorgen))

#---------------------------printing-the-board---------------------------#

    size = max(11 - base*2, 3) # box width

    def printBoard(board):

        def line(b,c): return b + b.join([c*(size)]*len(board)) + b
        border, gap = line("+","-"), line("|"," ")
        pad = "\n" + "\n".join([gap]*((size-2)//4)) if size > 5 else ""
        screen.addstr(0, 0, border, curses.color_pair(1))
        for row in board:
            screen.addstr(pad + "\n|", curses.color_pair(1))
            for e in row:
                if e: screen.addstr(str(e).center(size), colors[e])
                else: screen.addstr(" " * size)
                screen.addstr("|", curses.color_pair(1))
            screen.addstr(pad + "\n" + border, curses.color_pair(1))

#----------------------------------board---------------------------------#

    # The board is a list of n lists, each of length n, where n is the base
    # squared. Empty boxes are represented by None. The starting board has
    # one seed.
    board = addn([[None for _ in range(base**2)] for _ in range(base**2)])
    printBoard(board)

#----------------------------------game-loop-----------------------------#
    # The main game loop. Continues until there are not enough empty spaces
    # on the board, or "q" is pressed.

    for char in filter(moves.__contains__, iter(screen.getch, ord("q"))):
        moved = moves[char](board)
        if sum(not n for r in moved for n in r) < 2**(base-2): break
        if moved != board: board = addn(moved)
        printBoard(board)

#--------------------------------clean-up--------------------------------#
finally:
    curses.nocbreak()     # Wait for enter
    screen.keypad(0)      # Stop arrow-key handling
    curses.echo()         # Print all keyboard input
    curses.curs_set(True) # Show cursor
    curses.endwin()       # Return to normal prompt