I was working on some performanceintensive stuff recently, and I ran into the issue of sorting very small amounts of values (think 3, 4, 5).
The standard way to do this is with sorting networks. The way I’ll be using doesn’t actually perform any parallelism (unfortunately), but it is a clean way to write the networks in Haskell without too much repetition.
This website will generate an optimal sorting network for your given size, and the output (for 3) looks like this:
[[1,2]]
[[0,2]]
[[0,1]]
Each pair of indices represents a “compareandswap” operation: so the first line means “compare the value at 1 to the value at 2: if it’s bigger, swap them”. For 5, the network looks like this:
[[0,1],[2,3]]
[[0,2],[1,3]]
[[1,2],[0,4]]
[[1,4]]
[[2,4]]
[[3,4]]
Pairs on the same line can be performed in parallel.
For our case, I’m going to be looking at sorting tuples, but the technique can easily be generalized to vectors, etc.
The first trick is to figure out how to do “swapping”: we don’t want mutation, so what we can do instead is swap the reference to some value, by shadowing its name. In other words:
swap2 :: (a > a > Bool) > a > a > (a, a)
swap2 lte x y  lte x y = (x, y)
 otherwise = (y, x)
sort3 :: (a > a > Bool) > (a,a,a) > (a,a,a)
sort3 lte (_0,_1,_2)
= case swap2 lte _1 _2 of
(_1, _2) > case swap2 lte _0 _2 of
(_0, _2) > case swap2 lte _0 _1 of
(_0, _1) > (_0, _1, _2)
The indentation is hard to read, though, and wrappingandunwrapping tuples makes me nervous about the performance (although it may be inlined). The next step is to churchencode the pairs returned:
swap2 :: (a > a > Bool) > a > a > (a > a > b) > b
swap2 lte x y k
 lte x y = k x y
 otherwise = k y x
sort3 :: (a > a > Bool) > (a,a,a) > (a,a,a)
sort3 lte (_0,_1,_2)
= swap2 lte _1 _2 $ \ _1 _2 >
swap2 lte _0 _2 $ \ _0 _2 >
swap2 lte _0 _1 $ \ _0 _1 >
(_0,_1,_2)
Then, to get this to compile down to efficient code, we can make judicious use of inline
from GHC.Exts:
import GHC.Exts (inline)
swap2 :: (a > a > Bool) > a > a > (a > a > b) > b
swap2 lte x y k
 inline lte x y = inline k x y
 otherwise = inline k y x
{# INLINE swap2 #}
sort3 :: (a > a > Bool) > (a, a, a) > (a, a, a)
sort3 lte (_0,_1,_2)
= swap2 lte _1 _2 $ \ _1 _2 >
swap2 lte _0 _2 $ \ _0 _2 >
swap2 lte _0 _1 $ \ _0 _1 >
(_0,_1,_2)
{# INLINE sort3 #}
And to see if this really does make efficient code, let’s look at the core (cleaned up):
sort3
= \ (lte :: a > a > Bool)
(ds :: (a, a, a)) >
case ds of wild_X8 (_0, _1, _2) >
case lte _1 _2 of
False >
case lte _0 _1 of
False > (_2, _1, _0)
True >
case lte _0 _2 of
False > (_2, _0, _1)
True > (_0, _2, _1)
True >
case lte _0 _2 of
False >
case lte _2 _1 of
False > (_1, _2, _0)
True > (_2, _1, _0)
True >
case lte _0 _1 of
False > (_1, _0, _2)
True > wild_X8
Fantastic! When we specialize to Int
, we get all of the proper unpacking:
sort3Int :: (Int, Int, Int) > (Int, Int, Int)
sort3Int = inline sort3 (<=)
Core (with just the variable names cleaned up this time):
sort3Int
= \ (w :: (Int, Int, Int)) >
case w of w_X { (_0, _1, _2) >
case _0 of w_0 { GHC.Types.I# _0U >
case _1 of w_1 { GHC.Types.I# _1U >
case _2 of w_2 { GHC.Types.I# _2U >
case GHC.Prim.<=# _1U _2U of {
__DEFAULT >
case GHC.Prim.<=# _0U _1U of {
__DEFAULT > (w_2, w_1, w_0);
1# >
case GHC.Prim.<=# _0U _2U of {
__DEFAULT > (w_2, w_0, w_1);
1# > (w_0, w_2, w_1)
}
};
1# >
case GHC.Prim.<=# _0U _2U of {
__DEFAULT >
case GHC.Prim.<=# _2U _1U of {
__DEFAULT > (w_1, w_2, w_0);
1# > (w_2, w_1, w_0)
};
1# >
case GHC.Prim.<=# _0U _1U of {
__DEFAULT > (w_1, w_0, w_2);
1# > w_X
}
}
}
}
}
}
}
Now, for the real test: sorting 5tuples, using the network above.
sort5 :: (a > a > Bool) > (a,a,a,a,a) > (a,a,a,a,a)
sort5 lte (_0,_1,_2,_3,_4)
= swap2 lte _0 _1 $ \ _0 _1 >
swap2 lte _2 _3 $ \ _2 _3 >
swap2 lte _0 _2 $ \ _0 _2 >
swap2 lte _1 _3 $ \ _1 _3 >
swap2 lte _1 _2 $ \ _1 _2 >
swap2 lte _0 _4 $ \ _0 _4 >
swap2 lte _1 _4 $ \ _1 _4 >
swap2 lte _2 _4 $ \ _2 _4 >
swap2 lte _3 _4 $ \ _3 _4 >
(_0,_1,_2,_3,_4)
{# INLINE sort5 #}
The core output from this is over 1000 lines long: you can see it (with the variable names cleaned up) here.
In my benchmarks, these functions are indeed quicker than their counterparts in vector, but I’m not confident in my knowledge of Haskell performance to make much of a strong statement about them.
]]>The code from this post is available as a gist.
One of the most basic tools for use in typelevel programming is the Peano definition of the natural numbers:
data ℕ
= Z
 S ℕ
Using the new TypeFamilyDependencies
extension, we can parameterize the notion of “size”. I’m going to use the proportion symbol here:
type family (t ∷ k) ∝ (n ∷ ℕ) = (a ∷ Type)  a → t n k
Then, we can use it to provide an inductive class on the natural numbers:
class Finite n where
induction ∷ t ∝ Z → (∀ k. t ∝ k → t ∝ S k) → t ∝ n
instance Finite Z where
induction z _ = z
{# inline induction #}
instance Finite n ⇒ Finite (S n) where
induction z s = s (induction z s)
{# inline induction #}
The induction
function reads as the standard mathematical definition of induction: given a proof (value) of the zero case, and a proof that any proof is true for its successor, we can give you a proof of any finite number.
An added bonus here is that the size of something can usually be resolved at compiletime, so any inductive function on it should also be resolved at compile time.
We can use it to provide the standard instances for basic lengthindexed lists:
infixr 5 :
data List n a where
Nil ∷ List Z a
(:) ∷ a → List n a → List (S n) a
Some instances for those lists are easy:
instance Functor (List n) where
fmap _ Nil = Nil
fmap f (x : xs) = f x : fmap f xs
However, for Applicative
, we need some way to recurse on the size of the list. This is where induction comes in.
type instance '(List,a) ∝ n = List n a
This lets us write pure
in a pleasingly simple way:
instance Finite n ⇒
Applicative (List n) where
pure x = induction Nil (x :)
But can we also write <*>
using induction? Yes! Because we’ve factored out the induction itself, we just need to describe the notion of a “sized” function:
data a ↦ b
type instance ((x ∷ a) ↦ (y ∷ b)) ∝ n = (x ∝ n) → (y ∝ n)
Then we can write <*>
as so:
instance Finite n ⇒
Applicative (List n) where
pure x = induction Nil (x :)
(<*>) =
induction
(\Nil Nil → Nil)
(\k (f : fs) (x : xs) → f x : k fs xs)
What about the Monad
instance? For that, we need a little bit of plumbing: the type signature of >>=
is:
(>>=) ∷ m a → (a → m b) → m b
One of the parameters (the second a
) doesn’t have a size: we’ll need to work around that, with Const
:
type instance (Const a ∷ ℕ → Type) ∝ n = Const a n
Using this, we can write our Monad
instance:
head' ∷ List (S n) a → a
head' (x : _) = x
tail' ∷ List (S n) a → List n a
tail' (_ : xs) = xs
instance Finite n ⇒
Monad (List n) where
xs >>= (f ∷ a → List n b) =
induction
(\Nil _ → Nil)
(\k (y : ys) fn → head' (fn (Const y)) :
k ys (tail' . fn . Const . getConst))
xs
(f . getConst ∷ Const a n → List n b)
Getting the above to work actually took a surprising amount of work: the crux is that the ∝
type family needs to be injective, so the “successor” proof can typecheck. Unfortunately, this means that every type can only have one notion of “size”. What I’d prefer is to be able to pass in a function indicating exactly how to get the size out of a type, that could change depending on the situation. So we could recurse on the first argument of a function, for instance, or just its second, or just the result. This would need either typelevel lambdas (which would be cool), or generalized type family dependencies.
Pattern Synonyms is an excellent extension for Haskell. There are some very cool examples of their use out there, and I thought I’d add to the list.
Lists are the fundamental data structure for functional programmers. Unfortunately, once more specialized structures are required, you often have to switch over to an uncomfortable, annoying API which isn’t as pleasant or fun to use as cons and nil. With pattern synonyms, though, that’s not so! For instance, here’s how you would do it with a runlengthencoded list:
data List a
= Nil
 ConsN {# UNPACK #} !Int
a
(List a)
cons :: Eq a => a > List a > List a
cons x (ConsN i y ys)
 x == y = ConsN (i+1) y ys
cons x xs = ConsN 1 x xs
uncons :: List a > Maybe (a, List a)
uncons Nil = Nothing
uncons (ConsN 1 x xs) = Just (x, xs)
uncons (ConsN n x xs) = Just (x, ConsN (n1) x xs)
infixr 5 :
pattern (:) :: Eq a => a > List a > List a
pattern x : xs < (uncons > Just (x, xs))
where
x : xs = cons x xs
{# COMPLETE Nil, (:) #}
zip :: List a > List b > List (a,b)
zip (x : xs) (y : ys) = (x,y) : zip xs ys
zip _ _ = Nil
A little more useful would be to do the same with a heap:
data Tree a
= Leaf
 Node a (Tree a) (Tree a)
smerge :: Ord a => Tree a > Tree a > Tree a
smerge Leaf ys = ys
smerge xs Leaf = xs
smerge h1@(Node x lx rx) h2@(Node y ly ry)
 x <= y = Node x (smerge h2 rx) lx
 otherwise = Node y (smerge h1 ry) ly
cons :: Ord a => a > Tree a > Tree a
cons x = smerge (Node x Leaf Leaf)
uncons :: Ord a => Tree a > Maybe (a, Tree a)
uncons Leaf = Nothing
uncons (Node x l r) = Just (x, smerge l r)
infixr 5 :
pattern (:) :: Ord a => a > Tree a > Tree a
pattern x : xs < (uncons > Just (x, xs))
where
x : xs = cons x xs
{# COMPLETE Leaf, (:) #}
sort :: Ord a => [a] > [a]
sort = go . foldr (:) Leaf
where
go Leaf = []
go (x : xs) = x : go xs
In fact, this pattern can be generalized, so any containerlikething with a conslikething can be modified as you would with lists. You can see the generalization in lens.
One of the most confusing things I remember about learning Haskell earlyon was that the vast majority of the Monads examples didn’t work, because they were written pretransformers. In other words, the state monad was defined like so:
newtype State s a = State { runState :: s > (a, s) }
But in transformers nowadays (which is where you get State
from if you import it in the normal way), the definition is:
newtype StateT s m a = StateT { runStateT :: s > m (a, s) }
type State s = StateT s Identity
This results in some very confusing error messages when you try run example code.
However, we can pretend that the change never happened, with a simple pattern synonym:
newtype StateT s m a = StateT { runStateT :: s > m (a, s) }
type State s = StateT s Identity
runState :: State s a > s > (a, s)
runState xs = runIdentity . runStateT xs
pattern State :: (s > (a, s)) > State s a
pattern State x < (runState > x)
where
State x = StateT (Identity . x)
If you want to write typelevel proofs on numbers, you’ll probably end up using Peano numerals and singletons:
data Nat = Z  S Nat
data Natty n where
Zy :: Natty Z
Sy :: Natty n > Natty (S n)
type family (+) (n :: Nat) (m :: Nat) :: Nat where
Z + m = m
S n + m = S (n + m)
plusZeroIsZero :: Natty n > n + Z :~: n
plusZeroIsZero Zy = Refl
plusZeroIsZero (Sy n) = case plusZeroIsZero n of
Refl > Refl
Pretty cool, right? We can even erase the proof (if we really trust it) using rewrite rules:
{# RULES
"plusZeroIsZero" forall n. plusZeroIsZero n = unsafeCoerce Refl
#}
This isn’t ideal, but it’s getting there.
However, if we ever want to use these things at runtime (perhaps as a typelevel indication of some data structure’s size), we’re going to rely on the valuelevel Peano addition, which is bad news.
Not so with pattern synonyms!
data family The k :: k > Type
class Sing (a :: k) where sing :: The k (a :: k)
data Nat = Z  S Nat
newtype instance The Nat n = NatSing Natural
instance Sing Z where
sing = NatSing 0
instance Sing n => Sing (S n) where
sing =
(coerce :: (Natural > Natural) > (The Nat n > The Nat (S n)))
succ sing
data Natty n where
ZZy :: Natty Z
SSy :: The Nat n > Natty (S n)
getNatty :: The Nat n > Natty n
getNatty (NatSing n :: The Nat n) = case n of
0 > gcastWith (unsafeCoerce Refl :: n :~: Z) ZZy
_ > gcastWith (unsafeCoerce Refl :: n :~: S m) (SSy (NatSing (pred n)))
pattern Zy :: () => (n ~ Z) => The Nat n
pattern Zy < (getNatty > ZZy) where Zy = NatSing 0
pattern Sy :: () => (n ~ S m) => The Nat m > The Nat n
pattern Sy x < (getNatty > SSy x) where Sy (NatSing x) = NatSing (succ x)
{# COMPLETE Zy, Sy #}
type family (+) (n :: Nat) (m :: Nat) :: Nat where
Z + m = m
S n + m = S (n + m)
  Efficient addition, with typelevel proof.
add :: The Nat n > The Nat m > The Nat (n + m)
add = (coerce :: (Natural > Natural > Natural)
> The Nat n > The Nat m > The Nat (n + m)) (+)
  Proof on efficient representation.
addZeroRight :: The Nat n > n + Z :~: n
addZeroRight Zy = Refl
addZeroRight (Sy n) = gcastWith (addZeroRight n) Refl
(unfortunately, incomplete pattern warnings don’t work here)
So you’ve got a tree type:
data Tree a
= Tip
 Bin a (Tree a) (Tree a)
And you’ve spent some time writing a (reasonably difficult) function on the tree:
showTree :: Show a => Tree a > String
showTree Tip = ""
showTree (Bin x' ls' rs') = go True id xlen' ls'
$ showString xshw'
$ endc ls' rs'
$ showChar '\n'
$ go False id xlen' rs' ""
where
xshw' = show x'
xlen' = length xshw'
go _ _ _ Tip = id
go up k i (Bin x ls rs) = branch True ls
. k
. pad i
. showChar (bool '└' '┌' up)
. showString xshw
. endc ls rs
. showChar '\n'
. branch False rs
where
xshw = show x
xlen = length xshw
branch d
 d == up = go d (k . pad i) (xlen + 1)
 otherwise = go d (k . pad i . showChar '│') xlen
endc Tip Tip = id
endc Bin {} Tip = showChar '┘'
endc Tip Bin {} = showChar '┐'
endc Bin {} Bin {} = showChar '┤'
pad = (++) . flip replicate ' '
But, for some reason or another, you need to add a field to your Bin
constructor, to store the size of the subtree (for instance). Does this function have to change? No! Simply change the tree definition as so:
data Tree a
= Tip
 Bin' Int a (Tree a) (Tree a)
pattern Bin x ls rs < Bin' n x ls rs
{# COMPLETE Tip, Bin #}
And all the old code works!
This gets to the core of pattern synonyms: it’s another tool which we can use to separate implementation from API.
Say you’ve got a data type that has certain constraints on what values it can hold. You’re not writing a paper for ICFP, so expressing those constraints as a beautiful type isn’t required: you just want to only export the constructor and accessors, and write some tests to make sure that those functions always obey the constraints.
But once you do this you’ve lost something: patternmatching. Let’s get it back with pattern synonyms!
As our simple example, our constraint is going to be “A list where the values are always ordered”:
newtype List a = List { getList :: [a] }
cons :: Ord a => a > List a > List a
cons x (List xs) = List (insert x xs)
infixr 5 :
pattern (:) :: Ord a => a > List a > List a
pattern x : xs < (List (x:xs))
where
x : xs = cons x xs
pattern Nil = List []
{# COMPLETE Nil, (:) #}
There’s a popular UK TV show called Countdown with a round where contestants have to get as close to some target number as possible by constructing an arithmetic expression from six random numbers.
You don’t have to use all of the numbers, and you’re allowed use four operations: addition, subtraction, multiplication, and division. Additionally, each stage of the calculation must result in a positive integer.
Here’s an example. Try get to the target 586:
$100,25,1,5,3,10$
On the show, contestants get 30 seconds to think of an answer.
Solving it in Haskell was first explored in depth in Hutton (2002). There, a basic “generateandtest” implementation was provided and proven correct.
As an optimization problem, there are several factors which will influence the choice of algorithm:
I’ll be focusing on the third point in this post, but we can add the second point in at the end. First, however, let’s write a naive implementation.
I can’t think of a simpler way to solve the problem than generateandtest, so we’ll work from there. Testing is easy ((target ==) . eval
), so we’ll focus on generation. The core function we’ll use for this is usually called “unmerges”:
unmerges [x,y] = [([x],[y])]
unmerges (x:xs) =
([x],xs) :
concat
[ [(x:ys,zs),(ys,x:zs)]
 (ys,zs) < unmerges xs ]
unmerges _ = []
It generates all possible 2partitions of a list, ignoring order:
>>> unmerges "abc"
[("a","bc"),("ab","c"),("b","ac")]
I haven’t looked much into how to optimize this function or make it nicer, as we’ll be swapping it out later.
Next, we need to make the recursive calls:
allExprs :: (a > a > [a]) > [a] > [a]
allExprs _ [x] = [x]
allExprs c xs =
[ e
 (ys,zs) < unmerges xs
, y < allExprs c ys
, z < allExprs c zs
, e < c y z ]
Finally, using the simplereflect library, we can take a look at the output:
>>> allExprs (\x y > [x+y,x*y]) [1,2] :: [Expr]
[1 + 2,1 * 2]
>>> allExprs (\x y > [x+y]) [1,2,3] :: [Expr]
[1 + (2 + 3),1 + 2 + 3,2 + (1 + 3)]
Even at this early stage, we can actually already write a rudimentary solution:
countdown :: [Integer] > Integer > [Expr]
countdown xs targ =
filter
((==) targ . toInteger)
(allExprs
(\x y > [x,y,x+y,x*y])
(map fromInteger xs))
>>> mapM_ print (countdown [100,25,1,5,3,10] 586)
1 + (100 * 5 + (25 * 3 + 10))
1 + (100 * 5 + 25 * 3 + 10)
1 + (25 * 3 + (100 * 5 + 10))
1 + 100 * 5 + (25 * 3 + 10)
100 * 5 + (1 + (25 * 3 + 10))
100 * 5 + (1 + 25 * 3 + 10)
100 * 5 + (25 * 3 + (1 + 10))
1 + (100 * 5 + 25 * 3) + 10
1 + 100 * 5 + 25 * 3 + 10
100 * 5 + (1 + 25 * 3) + 10
100 * 5 + 25 * 3 + (1 + 10)
1 + 25 * 3 + (100 * 5 + 10)
25 * 3 + (1 + (100 * 5 + 10))
25 * 3 + (1 + 100 * 5 + 10)
25 * 3 + (100 * 5 + (1 + 10))
As you can see from the output, there’s a lot of repetition. We’ll need to do some memoization to speed it up.
The normal way most programmers think about “memoization” is something like this:
memo_dict = {0:0,1:1}
def fib(n):
if n in memo_dict:
return memo_dict[n]
else:
res = fib(n1) + fib(n2)
memo_dict[n] = res
return res
In other words, it’s a fundamentally stateful process. We need to mutate some mapping when we haven’t seen the argument before.
Using laziness, though, we can emulate the same behavior purely. Instead of mutating the mapping on function calls, we fill the whole thing at the beginning, and then index into it. As long as the mapping is lazy, it’ll only evaluate the function calls when they’re needed. We could use lists as our mapping to the natural numbers:
fibs = 0 : 1 : map fib [2..]
fib n = fibs !! (n1) + fibs !! (n2)
The benefit here is that we avoid the extra work of redundant calls. However, we pay for the speedup in three ways:
Hashable
, Ord
, etc.), some don’t, and we can’t memoize those using this technique.We’re going to look at a technique that allow us to somewhat mitigate 2 and 3 above, using something called a nexus.
The standard technique of memoization is focused on the arguments to the function, creating a concrete representation of them in memory to map to the results. Using nexuses, as described in Bird and Hinze (2003), we’ll instead focus on the function itself, creating a concrete representation of its call graph in memory. Here’s the call graph of Fibonacci:
┌fib(1)=1
┌fib(2)=1┤
│ └fib(0)=0
┌fib(3)=2┤
│ └fib(1)=1
┌fib(4)=3┤
│ │ ┌fib(1)=1
│ └fib(2)=1┤
│ └fib(0)=0
┌fib(5)=5┤
│ │ ┌fib(1)=1
│ │ ┌fib(2)=1┤
│ │ │ └fib(0)=0
│ └fib(3)=2┤
│ └fib(1)=1
fib(6)=8┤
│ ┌fib(1)=1
│ ┌fib(2)=1┤
│ │ └fib(0)=0
│ ┌fib(3)=2┤
│ │ └fib(1)=1
└fib(4)=3┤
│ ┌fib(1)=1
└fib(2)=1┤
└fib(0)=0
Turning that into a concrete datatype wouldn’t do us much good: it still has the massively redundant computations in it. However, we can recognize that entire subtrees are duplicates of each other: in those cases, instead of creating both subtrees, we could just create one and have each parent point to it^{1}:
┌fib(5)=5┬────────┬fib(3)=2┬────────┬fib(1)=1
fib(6)=8┤ │ │ │ │
└────────┴fib(4)=3┴────────┴fib(2)=1┴fib(0)=0
This is a nexus. In Haskell, it’s not observably different from the other form, except that it takes up significantly less space. It’s also much quicker to construct.
If we use it to memoize fib
, we’ll no longer be indexing on the argument: we’ll instead follow the relevant branch in the tree to the subcomputation, which is just chasing a pointer. It also means the argument doesn’t have to be constrained to any specific type. Here’s how you’d do it:
data Tree
= Leaf
 Node
{ val :: Integer
, left :: Tree
, right :: Tree}
fib :: Integer > Integer
fib = val . go
where
go 0 = Node 0 Leaf Leaf
go 1 = Node 1 (Node 0 Leaf Leaf) Leaf
go n = node t (left t) where t = go (n1)
node l r = Node (val l + val r) l r
So this approach sounds amazing, right? No constraints on the argument type, no need to pay for indexing: why doesn’t everyone use it everywhere? The main reason is that figuring out a nexus for the callgraph is hard. In fact, finding an optimal one is NPhard in general (Steffen and Giegerich 2006).
The second problem is that it’s difficult to abstract out. The standard technique of memoization relies on building a mapping from keys to values: about as breadandbutter as it gets in programming. Even more, we already know how to say “values of this type can be used efficiently as keys in some mapping”: for Data.Map it’s Ord
, for Data.HashMap it’s Hashable
. All of this together means we can build a nice library for memoization which exports the two following functions:
memoHash :: Hashable a => (a > b) > (a > b)
memoOrd :: Ord a => (a > b) > (a > b)
Building a nexus, however, is not breadandbutter. On top of that, it’s difficult to say something like “recursive functions of this structure can be constructed using a nexus”. What’s the typeclass for that? In comparison to the signatures above, the constraint will need to be on the arrows, not the a
. Even talking about the structure of recursive functions is regarded as somewhat of an advanced subject: that said, the recursionschemes package allows us to do so, and even has facilities for constructing something like nexuses with histomorphisms (Tobin 2016). I’m still looking to see if there’s a library out there that does manage to abstract nexuses in an ergonomic way, so I’d love to hear if there was one (or if there’s some more generalized form which accomplishes the same).
That’s enough preamble. The nexus we want to construct for countdown is not going to memoize as much as possible: in particular, we’re only going to memoize the shape of the trees, not the operators used. This will massively reduce the memory overhead, and still give a decent speedup (Bird and Mu 2005, 11 “building a skeleton tree first”).
With that in mind, the ideal nexus looks something like this:
We can represent the tree in Haskell as a rose tree:
data Tree a
= Node
{ root :: a
, forest :: Forest a
}
type Forest a = [Tree a]
Constructing the nexus itself isn’t actually the most interesting part of this solution: consuming it is. We need to be able to go from the structure above into a list that’s the equivalent of unmerges
. Doing a breadthfirst traversal of the diagram above (without the top element) will give us:
$abc, abd, acd, bcd, ab, ac, bc, ad, bd, cd, a, b, c, d$
If you split that list in half, and zip it with its reverse, you’ll get the output of unmerges
.
However, the breadthfirst traversal of the diagram isn’t the same thing as the breadthfirst traversal of the rose tree. The latter will traverse $abc, abd, acd, bcd$, and then the children of $abc$ ($ab,ac,bc$), and then the children of $abd$ ($ab,ad,bd$): and here’s our problem. We traverse $ab$ twice, because we can’t know that $abc$ and $abd$ are pointing to the same value. What we have to do is first prune the tree, removing duplicates, and then perform a breadthfirst traversal on that.
Luckily, the duplicates follow a pattern, allowing us to remove them without having to do any equality checking. In each row, the first node has no duplicates in its children, the second’s first child is a duplicate, the third’s first and second children are duplicates, and so on. You should be able to see this in the diagram above. Adapting a little from the paper, we get an algorithm like this:
para :: (a > [a] > b > b) > b > [a] > b
para f b = go
where
go [] = b
go (x:xs) = f x xs (go xs)
prune :: Forest a > Forest a
prune ts = pruneAt ts 0
where
pruneAt = para f (const [])
f (Node x []) t _ _ = Node x [] : t
f (Node x us) _ a k =
Node x (pruneAt (drop k us) k) : a (k + 1)
I went through this in a previous post, so this is the end solution:
breadthFirst :: Forest a > [a]
breadthFirst ts = foldr f b ts []
where
f (Node x xs) fw bw = x : fw (xs:bw)
b [] = []
b q = foldl (foldr f) b q []
With the appropriate incantations, this is actually the fastest implementation I’ve found.
We can actually inline both of the above functions, fusing them together:
spanNexus :: Forest a > [a]
spanNexus ts = foldr f (const b) ts 0 []
where
f (Node x us) fw k bw = x : fw (k+1) ((drop k us, k) : bw)
b [] = []
b qs = foldl (uncurry . foldr f . const) b qs []
So, now we can go from the tree to our list of splits. Next step is to convert that list into the output of unmerges, by zipping the reverse of the first half with the second. We can use an algorithm described in Danvy and Goldberg (2005) to do the zipping and reversing:
fold xs n = go xs n (const [])
where
go xs 0 k = k xs
go (x:xs) n k = go xs (n2) (\(y:ys) > (x,y) : k ys)
And we can inline the function which collapses those results into one:
fold xs n = go xs n (const [])
where
go 0 xss k = k xss
go n (xs:xss) k =
go (n2) xss (\(ys:yss) > [ z
 x < xs
, y < ys
, z < cmb x y
] ++ k yss)
And that’s all we need!
import qualified Data.Tree as Rose
data Tree a
= Leaf Int a
 Node [Tree a]
deriving (Show,Eq,Functor)
enumerateTrees :: (a > a > [a]) > [a] > [a]
enumerateTrees _ [] = []
enumerateTrees cmb xs = (extract . steps . initial) xs
where
step = map nodes . group
steps [x] = x
steps xs = steps (step xs)
initial = map (Leaf 1 . flip Rose.Node [] . pure)
extract (Leaf _ x) = Rose.rootLabel x
extract (Node [x]) = extract x
group [_] = []
group (Leaf _ x:vs) = Node [Leaf 2 [x, y]  Leaf _ y < vs] : group vs
group (Node u:vs) = Node (zipWith comb (group u) vs) : group vs
comb (Leaf n xs) (Leaf _ x) = Leaf (n + 1) (xs ++ [x])
comb (Node us) (Node vs) = Node (zipWith comb us vs)
forest ts = foldr f (const b) ts 0 []
where
f (Rose.Node x []) fw !k bw = x : fw (k + 1) bw
f (Rose.Node x us) fw !k bw = x : fw (k + 1) ((drop k us, k) : bw)
b [] = []
b qs = foldl (uncurry . foldr f . const) b qs []
nodes (Leaf n x) = Leaf 1 (node n x)
nodes (Node xs) = Node (map nodes xs)
node n ts = Rose.Node (walk (2 ^ n  2) (forest ts) (const [])) ts
where
walk 0 xss k = k xss
walk n (xs:xss) k =
walk (n2) xss (\(ys:yss) > [ z
 x < xs
, y < ys
, z < cmb x y
] ++ k yss)
The first thing to do for the Countdown solution is to figure out a representation for expressions. The one from simplereflect is perfect for displaying the result, but we should memoize its calculation.
data Memoed
= Memoed
{ expr :: Expr
, result :: Int
}
Then, some helpers for building:
data Op = Add  Dif  Mul  Div
binOp f g x y = Memoed ((f `on` expr) x y) ((g `on` result) x y)
apply :: Op > Memoed > Memoed > Memoed
apply Add x y = binOp (+) (+) x y
apply Dif x y
 result y < result x = binOp () () x y
 otherwise = binOp () () y x
apply Mul x y = binOp (*) (*) x y
apply Div x y = binOp div div x y
Finally, the full algorithm:
enumerateExprs :: [Int] > [Memoed]
enumerateExprs = enumerateTrees cmb . map (\x > Memoed (fromIntegral x) x)
where
cmb x y =
nubs $
x :
y :
[ apply op x y
 op < [Add, Dif, Mul, Div]
, legal op (result x) (result y) ]
legal Add _ _ = True
legal Dif x y = x /= y
legal Mul _ _ = True
legal Div x y = x `mod` y == 0
nubs xs = foldr f (const []) xs IntSet.empty
where
f e a s
 IntSet.member (result e) s = a s
 otherwise = e : a (IntSet.insert (result e) s)
countdown :: Int > [Int] > [Expr]
countdown targ = map expr . filter ((==) targ . result) . enumerateExprs
>>> (mapM_ print . reduction . head) (countdown 586 [100,25,1,5,3,10])
25 * 3 + 1 + (100 * 5 + 10)
75 + 1 + (100 * 5 + 10)
76 + (100 * 5 + 10)
76 + (500 + 10)
76 + 510
586
There are some optimizations going on here, taken mainly from Bird and Mu (2005):
So we’ve followed the paper, written the code: time to test. The specification of the function is relatively simple: calculate all applications of the commutative operator to some input, without recalculating subtrees.
We’ll need a free structure for the “commutative operator”:
data Tree a
= Leaf a
 Tree a :^: Tree a
deriving (Foldable,Eq,Ord,Show)
Here’s the problem: it’s not commutative! We can remedy it by only exporting a constructor that creates the tree in a commutative way, and we can make it a pattern synonym so it looks normal:
{# LANGUAGE DeriveFoldable #}
{# LANGUAGE PatternSynonyms #}
module Commutative
(Tree(Leaf)
,pattern (:*:))
where
data Tree a
= Leaf a
 Tree a :^: Tree a
deriving (Eq,Ord,Show,Foldable)
pattern (:*:) :: Ord a => Tree a > Tree a > Tree a
pattern xs :*: ys < xs :^: ys where
xs :*: ys
 xs <= ys = xs :^: ys
 otherwise = ys :^: xs
{# COMPLETE Leaf, (:*:) #}
Now we need to check if all applications are actually tested. First, to generate all trees:
allTrees :: Ord a => [a] > Set (Tree a)
allTrees [x] = Set.singleton (Leaf x)
allTrees xs = Set.unions (map (uncurry f) (unmerges xs))
where
f ls rs = Set.fromList ((liftA2 (:*:) `on` (Set.toList . allTrees)) ls rs)
allSubTrees :: Ord a => [a] > Set (Tree a)
allSubTrees [x] = Set.singleton (Leaf x)
allSubTrees xs =
Set.unions (map (uncurry f . (allSubTrees *** allSubTrees)) (unmerges xs))
where
f ls rs =
Set.unions
[ls, rs, Set.fromList ((liftA2 (:*:) `on` Set.toList) ls rs)]
Then, to test:
prop_exhaustiveSearch :: Natural > Bool
prop_exhaustiveSearch n =
let src = [0 .. fromIntegral n]
expect = allSubTrees src
actual =
Set.fromList
(enumerateTrees
(\xs ys >
[xs, ys, xs :*: ys])
(map Leaf src))
in expect == actual
prop_exhaustiveSearchFull :: Natural > Bool
prop_exhaustiveSearchFull n =
let src = [0 .. fromIntegral n]
expect = Map.fromSet (const 1) (allTrees src)
actual =
freqs
(enumerateTrees
(\xs ys > [xs :*: ys])
(map Leaf src))
in expect == actual
Testing for repeated calls is more tricky. Remember, the memoization is supposed to be unobservable: in order to see it, we’re going to have to use some unsafe operations.
traceSubsequences
:: ((Tree Int > Tree Int > [Tree Int]) > [Tree Int] > [Tree Int])
> [Int]
> (Map (Tree Int) Int, [Tree Int])
traceSubsequences enm ints =
runST $
do ref < newSTRef Map.empty
let res = enm (combine ref) (map (conv ref) ints)
traverse_ (foldr seq (pure ())) res
intm < readSTRef ref
pure (intm, res)
where
combine ref xs ys = unsafeRunST ([xs :*: ys] <$ modifySTRef' ref (incr (xs :*: ys)))
{# NOINLINE combine #}
conv ref x = unsafeRunST (Leaf x <$ modifySTRef' ref (incr (Leaf x)))
{# NOINLINE conv #}
unsafeRunST cmp = unsafePerformIO (unsafeSTToIO cmp)
prop_noRepeatedCalls :: Property
prop_noRepeatedCalls =
property $ sized $
\n >
pure $
let src = [0 .. n]
(tint,tres) = fmap freqs (traceSubsequences enumerateTrees src)
(fint,fres) = fmap freqs (traceSubsequences dummyEnumerate src)
in counterexample
(mapCompare (freqs (allSubTrees src)) tint)
(all (1 ==) tint) .&&.
counterexample (mapCompare tres fres) (tres == fres) .&&.
(n > 2 ==> tint /= fint)
Here, dummyEnumerate
is some method which performs the same task, but doesn’t construct a nexus, so we can ensure that our tests really do catch faulty implementations.
Bird, Richard, and Ralf Hinze. 2003. “Functional Pearl Trouble Shared is Trouble Halved.” In Proceedings of the 2003 ACM SIGPLAN Workshop on Haskell, 1–6. Haskell ’03. New York, NY, USA: ACM. doi:10.1145/871895.871896. http://doi.acm.org/10.1145/871895.871896.
Bird, Richard, and ShinCheng Mu. 2005. “Countdown: A case study in origami programming.” Journal of Functional Programming 15 (05) (August): 679. doi:10.1017/S0956796805005642. http://www.journals.cambridge.org/abstract_S0956796805005642.
Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” BRICS Report Series 12 (3). doi:10.7146/brics.v12i3.21869. https://tidsskrift.dk/brics/article/view/21869.
Hutton, Graham. 2002. “The Countdown Problem.” J. Funct. Program. 12 (6) (November): 609–616. doi:10.1017/S0956796801004300. http://www.cs.nott.ac.uk/~pszgmh/countdown.pdf.
Steffen, Peter, and Robert Giegerich. 2006. “Table Design in Dynamic Programming.” Information and Computation 204 (9) (September): 1325–1345. doi:10.1016/j.ic.2006.02.006. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.601&rep=rep1&type=pdf.
Tobin, Jared. 2016. “Time Traveling Recursion Schemes.” jtobin.io. https://jtobin.io/timetravelingrecursion.
If you think that structure looks more like a funny linked list than a tree, that’s because it is. Instead of talking about “left” and “right” branches, we could talk about the first and second elements in a list: in fact, this is exactly what’s happening in the famous zipWith
Fibonacci implementation (in reverse).
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Or, in my favourite version:
fib n = fix ((:) 0 . scanl (+) 1) !! n
In contrast to the more common binary trees, in a rose tree every node can have any number of children.
data Tree a
= Node
{ root :: a
, forest :: Forest a
}
type Forest a = [Tree a]
One of the important manipulations of this data structure, which forms the basis for several other algorithms, is a breadthfirst traversal. I’d like to go through a couple of techniques for implementing it, and how more generally you can often get away with using much simpler data structures if you really pinpoint the API you need from them.
As a general technique, Okasaki (2000) advises that a queue be used:
breadthFirst :: Tree a > [a]
breadthFirst tr = go (singleton tr)
where
go q = case pop q of
Nothing > []
Just (Node x xs,qs) > x : go (qs `append` xs)
There are three functions left undefined there: singleton
, pop
, and append
. They represent the API of our asofyet unimplemented queue, and their complexity will dictate the complexity of the overall algorithm. As a (bad) first choice, we could use simple lists, with the functions defined thus:
singleton x = [x]
pop (x:xs) = Just (x,xs)
pop [] = Nothing
append = (++)
Those repeated appends are bad news. The queue needs to be able to support popping from one side and appending from the other, which is something lists absolutely cannot do well.
We could swap in a more general queue implementation, possibly using Data.Sequence, or a pair of lists. But these are more complex and general than we need, so let’s try and pare down the requirements a little more.
First, we don’t need a pop: the go function can be expressed as a fold instead. Second, we don’t need every append to be immediately stuck into the queue, we can batch them, first appending to a structure that’s efficient for appends, and then converting that to a structure which is efficient for folds. In code:
breadthFirst :: Forest a > [a]
breadthFirst ts = foldr f b ts []
where
f (Node x xs) fw bw = x : fw (xs : bw)
b [] = []
b qs = foldl (foldr f) b qs []
We’re consing instead of appending, but the consumption is being done in the correct direction anyway, because of the foldl
.
So next step: to get the levels
function from Data.Tree. Instead of doing a breadthfirst traversal, it returns the nodes at each level of the tree. Conceptually, every time we did the reverse above (called foldl
), we will do a cons as well:
levels :: Forest a > [[a]]
levels ts = foldl f b ts [] []
where
f k (Node x xs) ls qs = k (x : ls) (xs : qs)
b _ [] = []
b k qs = k : foldl (foldl f) b qs [] []
The original reason I started work on these problems was this issue in containers. It concerns the unfoldTreeM_BF
function. An early go at rewriting it, inspired by levels above, looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

It basically performs the same this as the levels function, but builds the tree back up in the end using the run
function. In order to do that, we store the length of each subforest on line 9, so that each node knows how much to take from each level.
A possible optimization is to stop taking the length. Anything in list processing that takes a length screams “wrong” to me (although it’s not always true!) so I often try to find a way to avoid it. The first option would be to keep the cs
on line 8 around, and use it as an indicator for the length. That keeps it around longer than strictly necessary, though. The other option is to add a third level: for breadthFirst
above, we had one level; for levels
, we added another, to indicate the structure of the nodes and their subtrees; here, we can add a third, to maintain that structure when building back up:
unfoldForestM_BF :: Monad m => (b > m (a, [b])) > [b] > m (Forest a)
unfoldForestM_BF f ts = b [ts] (\ls > concat . ls)
where
b [] k = pure (k id [])
b qs k = foldl g b qs [] (\ls > k id . ls)
g a xs qs k = foldr t (\ls ys > a ys (k . run ls)) xs [] qs
t a fw xs bw = f a >>= \(x,cs) > fw (x:xs) (cs:bw)
run x xs = uncurry (:) . foldl go ((,) [] . xs) x
where
go ys y (z:zs) = (Node y z : ys', zs')
where
(ys',zs') = ys zs
This unfortunately slows down the code.
Okasaki, Chris. 2000. “Breadthfirst Numbering: Lessons from a Small Exercise in Algorithm Design.” In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, 131–136. ICFP ’00. New York, NY, USA: ACM. doi:10.1145/351240.351253. https://www.cs.tufts.edu/~nr/cs257/archive/chrisokasaki/breadthfirst.pdf.
While working on something else, I figured out a nice Haskell implementation of Huffman coding, and I thought I’d share it here. I’ll go through a few techniques for transforming a multipass algorithm into a singlepass one first, and then I’ll show how to use them for Huffman. If you just want to skip to the code, it’s provided at the end.^{1}
The algorithm isn’t singlepass in the sense of Adaptive Huffman Coding: it still uses the normal Huffman algorithm, but the input is transformed in the same traversal that builds the tree to transform it.
There are several techniques for turning multipass algorithms into singlepass ones in functional languages. Perhaps the most famous is circular programming: using laziness to eliminate a pass. R. S. Bird (1984) used this to great effect in solving the repmin problem:
Given a tree of integers, replace every integer with the minimum integer in the tree, in one pass.
For an imperative programmer, the problem is relatively easy: first, write the code to find the minimum value in the tree in the standard way, using a loop and a “smallest so far” accumulator. Then, inside the loop, after updating the accumulator, set the value of the leaf to be a reference to the accumulator.
At first, that solution may seem necessarily impure: we’re using global, mutable state to update many things at once. However, as the paper shows, we can claw back purity using laziness:
data Tree a = Leaf a  Tree a :*: Tree a
repMin :: Tree Integer > Tree Integer
repMin xs = ys where
(m, ys) = go xs
go (Leaf x) = (x, Leaf m)
go (xs :*: ys) = (min x y, xs' :*: ys')
where
(x,xs') = go xs
(y,ys') = go ys
Let’s say we don’t have laziness at our disposal: are we hosed? No!^{2} Danvy and Goldberg (2005) explore this very issue, by posing the question:
Given two lists, xs and ys, can you zip xs with the reverse of ys in one pass?
The technique used to solve the problem is named “There and Back Again”; it should be clear why from one of the solutions:
convolve xs ys = walk xs const where
walk [] k = k [] ys
walk (x:xs) k = walk xs (\r (y:ys) > k ((x,y) : r) ys)
The traversal of one list builds up the function to consume the other. We could write repmin in the same way:
repMin = uncurry ($) . go where
go (Leaf x) = (Leaf, x)
go (xs :*: ys) = (\m > xs' m :*: ys' m, min xm ym) where
(xs',xm) = go xs
(ys',ym) = go ys
If you’re doing a lot of appending to some listlike structure, you probably don’t want to use actual lists: you’ll end up traversing the lefthandside of the append many more times than necessary. A type you can drop in to use instead is difference lists (Hughes 1986):
type DList a = [a] > [a]
rep :: [a] > DList a
rep = (++)
abs :: DList a > [a]
abs xs = xs []
append :: DList a > DList a > DList a
append = (.)
append
is $\mathcal{O}(1)$ in this representation. In fact, for any monoid with a slow mappend
, you can use the same trick: it’s called the Cayley representation, and available as Endo
in Data.Monoid.
rep :: Monoid a => a > Endo a
rep x = Endo (mappend x)
abs :: Monoid a => Endo a > a
abs (Endo f) = f mempty
instance Monoid (Endo a) where
mempty = Endo id
mappend (Endo f) (Endo g) = Enfo (f . g)
You can actually do the same transformation for “monoids” in the categorical sense: applying it to monads, for instance, will give you codensity (Rivas and Jaskelioff 2014).
Looking back—just for a second—to the repmin example, we should be able to spot a pattern we can generalize. There’s really nothing treespecific about it, so why can’t we apply it to lists? Or other structures, for that matter? It turns out we can: the mapAccumL
function is tailormade to this need:
repMin :: Traversable t => t Integer > t Integer
repMin xs = ys where
(~(Just m), ys) = mapAccumL f Nothing xs
f Nothing x = (Just x, m)
f (Just y) x = (Just (min x y), m)
The tilde before the Just
ensures this won’t fail on empty input.
Finally, it’s time for the main event. Huffman coding is a very multipass algorithm, usually. The steps look like this:
We can’t skip any of these steps: we can try perform them all at once, though.
Let’s write the multipass version first. We’ll need the frequency table:
frequencies :: Ord a => [a] > Map a Int
frequencies = Map.fromListWith (+) . map (flip (,) 1)
And a heap, ordered on the frequencies of its elements (I’m using a skew heap here):
data Heap a
= Nil
 Node {# UNPACK #} !Int a (Heap a) (Heap a)
instance Monoid (Heap a) where
mappend Nil ys = ys
mappend xs Nil = xs
mappend h1@(Node i x lx rx) h2@(Node j y ly ry)
 i <= j = Node i x (mappend h2 rx) lx
 otherwise = Node j y (mappend h1 ry) ly
mempty = Nil
Next, we need to build the tree^{3}. We can use the tree type from above.
buildTree :: Map a Int > Maybe (Tree a)
buildTree = prune . toHeap where
toHeap = Map.foldMapWithKey (\k v > Node v (Leaf k) Nil Nil)
prune Nil = Nothing
prune (Node i x l r) = case mappend l r of
Nil > Just x
Node j y l' r' >
prune (mappend (Node (i+j) (x :*: y) Nil Nil) (mappend l' r'))
Then, a way to convert between the tree and a map:
toMapping :: Ord a => Tree a > Map a [Bool]
toMapping (Leaf x) = Map.singleton x []
toMapping (xs :*: ys) =
Map.union (fmap (True:) (toMapping xs)) (fmap (False:) (toMapping ys))
And finally, putting the whole thing together:
huffman :: Ord a => [a] > (Maybe (Tree a), [[Bool]])
huffman xs = (tree, map (mapb Map.!) xs) where
freq = frequencies xs
tree = buildTree freq
mapb = maybe Map.empty toMapping tree
The first thing to fix is the toMapping
function: at every level, it calls union
, a complex and expensive operation. However, union
and empty
form a monoid, so we can use the Cayley representation to reduce the calls to a minimum. Next, we want to get rid of the fmap
s: we can do that by assembling a function to perform the fmap
as we go, as in convolve
^{4}.
toMapping :: Ord a => Tree a > Map a [Bool]
toMapping tree = go tree id Map.empty where
go (Leaf x) k = Map.insert x (k [])
go (xs :*: ys) k =
go xs (k . (:) True) . go ys (k . (:) False)
Secondly, we can integrate the toMapping
function with the buildTree
function, removing another pass:
buildTree :: Ord a => Map a Int > Maybe (Tree a, Map a [Bool])
buildTree = prune . toHeap where
toHeap = Map.foldMapWithKey (\k v > Node v (Leaf k, leaf k) Nil Nil)
prune Nil = Nothing
prune (Node i x l r) = case mappend l r of
Nil > Just (fmap (\k > k id Map.empty) x)
Node j y l' r' >
prune (mappend (Node (i+j) (cmb x y) Nil Nil) (mappend l' r'))
leaf x k = Map.insert x (k [])
node xs ys k = xs (k . (:) True) . ys (k . (:) False)
cmb (xt,xm) (yt,ym) = (xt :*: yt, node xm ym)
Finally, to remove the second pass over the list, we can copy repmin, using mapAccumL
to both construct the mapping and apply it to the structure in one go.
huffman :: (Ord a, Traversable t) => t a > (Maybe (Tree a), t [Bool])
huffman xs = (fmap fst tree, ys) where
(freq,ys) = mapAccumL f Map.empty xs
f fm x = (Map.insertWith (+) x 1 fm, mapb Map.! x)
tree = buildTree freq
mapb = maybe Map.empty snd tree
And that’s it!
The similarity between the repmin function and the solution above is suggestive: is there a way to encode this idea of making a multipass algorithm singlepass? Of course! We can use an applicative:
data Circular a b c =
Circular !a
(b > c)
instance Functor (Circular a b) where
fmap f (Circular tally run) = Circular tally (f . run)
instance Monoid a =>
Applicative (Circular a b) where
pure x = Circular mempty (const x)
Circular fl fr <*> Circular xl xr =
Circular
(mappend fl xl)
(\r > fr r (xr r))
liftHuffman
:: Ord a
=> a > Circular (Map a Int) (Map a [Bool]) [Bool]
liftHuffman x = Circular (Map.singleton x 1) (Map.! x)
runHuffman
:: Ord a
=> Circular (Map a Int) (Map a [Bool]) r > (Maybe (Tree a), r)
runHuffman (Circular smry run) =
maybe (Nothing, run Map.empty) (Just *** run) (buildTree smry)
huffman
:: (Ord a, Traversable t)
=> t a > (Maybe (Tree a), t [Bool])
huffman = runHuffman . traverse liftHuffman
Thanks to it being an applicative, you can do all the fun lensy things with it:
showBin :: [Bool] > String
showBin = map (bool '0' '1')
>>> let liftBin = fmap showBin . liftHuffman
>>> (snd . runHuffman . (each.traverse) liftBin) ("abb", "cad", "c")
(["01","11","11"],["00","01","10"],["00"])
Bringing us back to the start, it can also let us solve repmin!
liftRepMin :: a > Circular (Option (Min a)) a a
liftRepMin x = Circular (pure (pure x)) id
runRepMin :: Circular (Option (Min a)) a b > b
runRepMin (Circular m r) = r (case m of
Option (Just (Min x)) > x)
repMin :: (Ord a, Traversable t) => t a > t a
repMin = runRepMin . traverse liftRepMin
So the Circular
type is actually just the product of reader and writer, and is closely related to the sort type.
It’s also related to the Prescient
type, which I noticed after I’d written the above.
Bird, R. S. 1984. “Using Circular Programs to Eliminate Multiple Traversals of Data.” Acta Inf. 21 (3) (October): 239–250. doi:10.1007/BF00264249. http://dx.doi.org/10.1007/BF00264249.
Bird, Richard, Geraint Jones, and Oege De Moor. 1997. “More haste‚ less speed: Lazy versus eager evaluation.” Journal of Functional Programming 7 (5) (September): 541–547. doi:10.1017/S0956796897002827. https://ora.ox.ac.uk/objects/uuid:761a464660a24622a1e0ddea11507d57/datastreams/ATTACHMENT01.
Danvy, Olivier, and Mayer Goldberg. 2005. “There and Back Again.” http://brics.dk/RS/05/3/BRICSRS053.pdf.
Hughes, R. John Muir. 1986. “A Novel Representation of Lists and Its Application to the Function ‘Reverse’.” Information Processing Letters 22 (3) (March): 141–144. doi:10.1016/00200190(86)900591. http://www.sciencedirect.com/science/article/pii/0020019086900591.
Pippenger, Nicholas. 1997. “Pure Versus Impure Lisp.” ACM Trans. Program. Lang. Syst. 19 (2) (March): 223–238. doi:10.1145/244795.244798. http://doi.acm.org/10.1145/244795.244798.
Rivas, Exequiel, and Mauro Jaskelioff. 2014. “Notions of Computation as Monoids.” arXiv:1406.4823 [cs, math] (May). http://arxiv.org/abs/1406.4823.
Huffman coding singlepass implementation:
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.Traversable (mapAccumL)
data Heap a
= Nil
 Node {# UNPACK #} !Int a (Heap a) (Heap a)
instance Monoid (Heap a) where
mappend Nil ys = ys
mappend xs Nil = xs
mappend h1@(Node i x lx rx) h2@(Node j y ly ry)
 i <= j = Node i x (mappend h2 rx) lx
 otherwise = Node j y (mappend h1 ry) ly
mempty = Nil
data Tree a = Leaf a  Tree a :*: Tree a
buildTree :: Ord a => Map a Int > Maybe (Tree a, Map a [Bool])
buildTree = prune . toHeap where
toHeap = Map.foldMapWithKey (\k v > Node v (Leaf k, leaf k) Nil Nil)
prune Nil = Nothing
prune (Node i x l r) = case mappend l r of
Nil > Just (fmap (\k > k id Map.empty) x)
Node j y l' r' >
prune (mappend (Node (i+j) (cmb x y) Nil Nil) (mappend l' r'))
leaf x k = Map.insert x (k [])
node xs ys k = xs (k . (:) True) . ys (k . (:) False)
cmb (xt,xm) (yt,ym) = (xt :*: yt, node xm ym)
huffman :: (Ord a, Traversable t) => t a > (Maybe (Tree a), t [Bool])
huffman xs = (fmap fst tree, ys) where
(freq,ys) = mapAccumL f Map.empty xs
f fm x = (Map.insertWith (+) x 1 fm, mapb Map.! x)
tree = buildTree freq
mapb = maybe Map.empty snd tree
Well, that’s a little bit of a lie. In terms of asympostics, Pippenger (1997) stated a problem that could be solved in linear time in impure Lisp, but $\Omega(n \log n)$ in pure Lisp. R. Bird, Jones, and Moor (1997) then produced an algorithm that could solve the problem in linear time, by using laziness. So, in some cases, laziness will give you asymptotics you can’t get without it (if you want to stay pure).↩
There’s actually a nicer version of the buildTree
function which uses StateT (Heap a) Maybe
, but it’s equivalent to this one under the hood, and I though might be a little distracting.↩
Something to notice about this function is that it’s going topdown and bottomup at the same time. Combining the maps (with (.)
) is done bottomup, but building the codes is topdown. This means the codes are built in reverse order! That’s why the accumulating parameter (k
) is a difference list, rather than a normal list. As it happens, if normal lists were used, the function would be slightly more efficient through sharing, but the codes would all be reversed.↩
Here’s an old Haskell chestnut:
>>> filterM (\_ > [False, True]) [1,2,3]
[[],[3],[2],[2,3],[1],[1,3],[1,2],[1,2,3]]
filterM (\_ > [False,True])
gives the power set of some input list. It’s one of the especially magical demonstrations of monads. From a highlevel perspective, it makes sense: for each element in the list, we want it to be present in one output, and not present in another. It’s hard to see how it actually works, though. The (old^{1}) source for filterM
doesn’t help hugely, either:
filterM :: (Monad m) => (a > m Bool) > [a] > m [a]
filterM _ [] = return []
filterM p (x:xs) = do
flg < p x
ys < filterM p xs
return (if flg then x:ys else ys)
Again, elegant and beautiful (aside from the threespace indent), but opaque. Despite not really getting how it works, I was encouraged by its simplicity to try my hand at some of the other functions from Data.List.
Let’s start with the subject of my last post. Here’s the implementation:
groupBy :: (a > a > Bool) > [a] > [[a]]
groupBy p xs = build (\c n >
let f x a q
 q x = (x : ys, zs)
 otherwise = ([], c (x : ys) zs)
where (ys,zs) = a (p x)
in snd (foldr f (const ([], n)) xs (const False)))
It translates over pretty readily:
groupByM :: Applicative m => (a > a > m Bool) > [a] > m [[a]]
groupByM p xs =
fmap snd (foldr f (const (pure ([], []))) xs (const (pure (False))))
where
f x a q = liftA2 st (q x) (a (p x)) where
st b (ys,zs)
 b = (x : ys, zs)
 otherwise = ([], (x:ys):zs)
Let’s try it with a similar example to filterM
:
>>> groupByM (\_ _ > [False, True]) [1,2,3]
[[[1],[2],[3]],[[1],[2,3]],[[1,2],[3]],[[1,2,3]]]
It gives the partitions of the list!
So these monadic generalisations have been discovered before, several times over. There’s even a package with monadic versions of the functions in Data.List. Exploring this idea with a little more formality is the paper “All Sorts of Permutations” (Christiansen, Danilenko, and Dylus 2016), and accompanying presentation on YouTube. They show that the monadic version of sort produces permutations of the input list, and examine the output from different sorting algorithms. Here’s a couple of their implementations, altered slightly:
insertM :: Monad m => (a > a > m Bool) > a > [a] > m [a]
insertM _ x [] = pure [x]
insertM p x yys@(y:ys) = do
lte < p x y
if lte
then pure (x:yys)
else fmap (y:) (insertM p x ys)
insertSortM :: Monad m => (a > a > m Bool) > [a] > m [a]
insertSortM p = foldrM (insertM p) []
partitionM :: Applicative m => (a > m Bool) > [a] > m ([a],[a])
partitionM p = foldr f (pure ([],[])) where
f x = liftA2 ifStmt (p x) where
ifStmt flg (tr,fl)
 flg = (x:tr,fl)
 otherwise = (tr,x:fl)
quickSortM :: Monad m => (a > a > m Bool) > [a] > m [a]
quickSortM p [] = pure []
quickSortM p (x:xs) = do
(gt,le) < partitionM (p x) xs
ls < quickSortM p le
gs < quickSortM p gt
pure (ls ++ [x] ++ gs)
>>> insertSortM (\_ _ > [False,True]) [1,2,3]
[[1,2,3],[1,3,2],[3,1,2],[2,1,3],[2,3,1],[3,2,1]]
>>> quickSortM (\_ _ > [False,True]) [1,2,3]
[[3,2,1],[2,3,1],[2,1,3],[3,1,2],[1,3,2],[1,2,3]]
As it should be easy to see, they’re very concise and elegant, and strongly resemble the pure versions of the algorithms.
So the examples above are very interesting and cool, but they don’t necessarily have a place in real Haskell code. If you wanted to find the permutations, partitions, or power set of a list you’d probably use a more standard implementation. That’s not to say that these monadic functions have no uses, though: especially when coupled with State
they yield readable and fast implementations for certain tricky functions. ordNub
, for instance:
ordNub :: Ord a => [a] > [a]
ordNub =
flip evalState Set.empty .
filterM
(\x > do
flg < gets (Set.notMember x)
when flg (modify (Set.insert x))
pure flg)
Alternatively, using a monadic version of maximumOn
:
maximumOnM :: (Applicative m, Ord b) => (a > m b) > [a] > m (Maybe a)
maximumOnM p = (fmap . fmap) snd . foldl f (pure Nothing)
where
f a e = liftA2 g a (p e)
where
g Nothing q = Just (q, e)
g b@(Just (o, y)) q
 o < q = Just (q, e)
 otherwise = b
You can write a onepass mostFrequent
:
mostFrequent :: Ord a => [a] > Maybe a
mostFrequent =
flip evalState Map.empty .
maximumOnM
(\x > maybe 1 succ <$> state (Map.insertLookupWithKey (const (+)) x 1))
One of the nicest things about the paper was the diagrams of decision trees provided for each sorting algorithm. I couldn’t find a library to do that for me, so I had a go at producing my own. First, we’ll need a data type to represent the tree itself:
data DecTree t a
= Pure a
 Choice t (DecTree t a) (DecTree t a)
deriving Functor
We’ll say the left branch is “true” and the right “false”. Applicative and monad instances are relatively mechanical^{2}:
instance Applicative (DecTree t) where
pure = Pure
Pure f <*> xs = fmap f xs
Choice c ls rs <*> xs = Choice c (ls <*> xs) (rs <*> xs)
instance Monad (DecTree t) where
Pure x >>= f = f x
Choice c ls rs >>= f = Choice c (ls >>= f) (rs >>= f)
We can now create a comparator function that constructs one of these trees, and remembers the values it was given:
traceCompare :: a > a > DecTree (a,a) Bool
traceCompare x y = Choice (x,y) (Pure True) (Pure False)
Finally, to draw the tree, I’ll use a function from my binary tree library:
printDecTree :: (Show a, Show b) => String > DecTree (a,a) b > IO ()
printDecTree rel t = putStr (drawTreeWith id (go t) "") where
go (Pure xs) = Node (show xs) Leaf Leaf
go (Choice (x,y) tr fl) =
Node (show x ++ rel ++ show y) (go tr) (go fl)
And we get these really nice diagrams out:
>>> (printDecTree "<=" . insertSortM traceCompare) [1,2,3]
┌[1,2,3]
┌1<=2┤
│ │ ┌[2,1,3]
│ └1<=3┤
│ └[2,3,1]
2<=3┤
│ ┌[1,3,2]
└1<=3┤
│ ┌[3,1,2]
└1<=2┤
└[3,2,1]
>>> (printDecTree "<=" . quickSortM traceCompare) [1,2,3]
┌[1,2,3]
┌2<=3┤
│ └[1,3,2]
┌1<=3┤
│ └[3,1,2]
1<=2┤
│ ┌[2,1,3]
└1<=3┤
│ ┌[2,3,1]
└2<=3┤
└[3,2,1]
We can also try it out with the other monadic list functions:
>>> (printDecTree "=" . groupByM traceCompare) [1,2,3]
┌[[1,2,3]]
┌2=3┤
│ └[[1,2],[3]]
1=2┤
│ ┌[[1],[2,3]]
└2=3┤
└[[1],[2],[3]]
You might notice that none of these “monadic” functions actually require a monad constraint: they’re all applicative. There’s a straightforward implementation that relies only on applicative for most of these functions, with a notable exception: sort. Getting that to work with just applicative is the subject of a future post.
Christiansen, Jan, Nikita Danilenko, and Sandra Dylus. 2016. “All Sorts of Permutations (Functional Pearl).” In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, 168–179. ICFP 2016. New York, NY, USA: ACM. doi:10.1145/2951913.2951949. http://informatik.unikiel.de/~sad/icfp2016preprint.pdf.
The definition has since been updated to more modern Haskell: it now uses a fold, and only requires Applicative
.↩
Part of the reason the instances are so mechanical is that this type strongly resembles the free monad:
data Free f a = Pure a  Free (f (Free f a))
In fact, the example given in the MonadFree
class is the following:
data Pair a = Pair a a
type Tree = Free Pair
The only difference with the above type and the decision tree is that the decision tree carries a tag with it.
So what’s so interesting about this relationship? Well, Pair
is actually a representable functor. Any representable functor f a
can be converted to (and from) a function key > a
, where key
is the specific key for f
. The key for Pair
is Bool
: the result of the function we passed in to the sorting functions!
In general, you can make a “decision tree” for any function of type a > b
like so:
type DecTree a b r = Rep f ~ b => Free (Compose ((,) a) f) r
But more on that in a later post.↩
Here’s a useful function from Data.List:
groupBy :: (a > a > Bool) > [a] > [[a]]
groupBy (==) "aabcdda"
 ["aa","b","c","dd","a"]
However, as has been pointed out before^{1}, groupBy
expects an equivalence relation, and can exhibit surprising behavior when it doesn’t get one. Let’s say, for instance, that we wanted to group numbers that were close together:
groupClose :: [Integer] > [[Integer]]
groupClose = groupBy (\x y > abs (x  y) < 3)
What would you expect on the list [1, 2, 3, 4, 5]
? All in the same group? Well, what you actually get is:
[[1,2,3],[4,5]]
This is because the implementation of groupBy
only compares to the first element in each group:
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
Brandon Simmons gave a definition of groupBy
that is perhaps more useful, but it used explicit recursion, rather than a fold.
A definition with foldr
turned out to be trickier than I expected. I found some of the laziness properties especially difficult:
>>> head (groupBy (==) (1:2:undefined))
[1]
>>> (head . head) (groupBy (==) (1:undefined))
1
>>> (head . head . tail) (groupBy (==) (1:2:undefined))
2
Here’s the definition I came up with, after some deliberation:
groupBy :: (a > a > Bool) > [a] > [[a]]
groupBy p xs = build (\c n >
let f x a q
 q x = (x : ys, zs)
 otherwise = ([], c (x : ys) zs)
where (ys,zs) = a (p x)
in snd (foldr f (const ([], n)) xs (const False)))
{# INLINE groupBy #}
Seemingly benign changes to the function will break one or more of the above tests. In particular, the laziness of a “where” binding needs to be taken into account. Here’s an early attempt which failed:
groupBy :: (a > a > Bool) > [a] > [[a]]
groupBy p xs = build (\c n >
let f x a q d
 q x = a (p x) (d . (:) x)
 otherwise = d [] (a (p x) (c . (:) x))
in foldr f (\_ d > d [] n) xs (const False) (\ _ y > y))
Once done, though, it works as expected:
>>> groupBy (==) "aaabcccdda"
["aaa","b","ccc","dd","a"]
>>> groupBy (==) []
[]
>>> groupBy (<=) [1,2,2,3,1,2,0,4,5,2]
[[1,2,2,3],[1,2],[0,4,5],[2]]
It’s the fastest version I could find that obeyed the above laziness properties.
The GHC page on the issue unfortunately seems to indicate the implementation won’t be changed. Ah, well. Regardless, I have a repository with the implementation above (with extra fusion machinery added) and comparisons to other implementations.
There are several threads on the libraries mailing list on this topic:
There are three main ways to fold things in Haskell: from the right, from the left, and from either side. Let’s look at the left vs right variants first. foldr
works from the right:
foldr (+) 0 [1,2,3]
1 + (2 + (3 + 0))
And foldl
from the left:
foldl (+) 0 [1,2,3]
((0 + 1) + 2) + 3
As you’ll notice, the result of the two operations above is the same (6; although one may take much longer than the other). In fact, whenever the result of foldr
and foldl
is the same for a pair of arguments (in this case +
and 0
), we say that that pair forms a Monoid
for some type (well, there’s some extra stuff to do with 0
, but I only care about associativity at the moment). In this case, the Sum
monoid is formed:
newtype Sum a = Sum { getSum :: a }
instance Num a => Monoid (Sum a) where
mempty = Sum 0
mappend (Sum x) (Sum y) = Sum (x + y)
When you know that you have a monoid, you can use the foldMap
function: this is the third kind of fold. It says that you don’t care which of foldl
or foldr
is used, so the implementer of foldMap
can put the parentheses wherever they want:
foldMap Sum [1,2,3]
(1 + 2) + (3 + 0)
0 + ((1 + 2) + 3)
((0 + 1) + 2) + 3
And we can’t tell the difference from the result. This is a pretty barebones introduction to folds and monoids: you won’t need to know more than that for the rest of this post, but the topic area is fascinating and deep, so don’t let me give you the impression that I’ve done anything more than scratched the surface.
Quite often, we do care about where the parentheses go. Take, for instance, a binary tree type, with values at the leaves:
data Tree a
= Empty
 Leaf a
 Tree a :*: Tree a
instance Show a =>
Show (Tree a) where
show Empty = "()"
show (Leaf x) = show x
show (l :*: r) = "(" ++ show l ++ "*" ++ show r ++ ")"
We can’t (well, shouldn’t) us foldMap
here, because we would be able to tell the difference between different arrangements of parentheses:
foldMap something [1,2,3]
((1*2)*(3*()))
(()*((1*2)*3))
(((()*1)*2)*3)
So we use one of the folds which lets us choose the arrangements of parentheses:
(foldr (:*:) Empty . map Leaf) [1,2,3,4,5,6]
 (1*(2*(3*(4*(5*(6*()))))))
(foldl (:*:) Empty . map Leaf) [1,2,3,4,5,6]
 ((((((()*1)*2)*3)*4)*5)*6)
The issue is that neither of the trees generated are necessarily what we want: often, we want something more balanced.
To try and find a more balanced fold, let’s (for now) assume we’re always going to get nonempty input. This will let us simplify the Tree
type a little, to:
data Tree a
= Leaf a
 Tree a :*: Tree a
deriving Foldable
instance Show a =>
Show (Tree a) where
show (Leaf x) = show x
show (l :*: r) = "(" ++ show l ++ "*" ++ show r ++ ")"
Then, we can use Jon Fairbairn’s fold described in this email, adapted a bit for our nonempty input:
import Data.List.NonEmpty (NonEmpty(..))
treeFold :: (a > a > a) > NonEmpty a > a
treeFold f = go
where
go (x : []) = x
go (a : b:l) = go (f a b : pairMap l)
pairMap (x:y:rest) = f x y : pairMap rest
pairMap xs = xs
There are two parts to this function: pairMap
and the go
helper. pairMap
combines adjacent elements in the list using the combining function. As a toplevel function it might look like this:
pairMap f (x:y:rest) = f x y : pairMap f rest
pairMap f xs = xs
pairMap (++) ["a","b","c","d","e"]
 ["ab","cd","e"]
As you can see, it leaves any leftovers untouched at the end of the list.
The go
helper applies pairMap
repeatedly to the list until it has only one element. This gives us much more balanced results that foldl
or foldr
(turn on XOverloadedLists
to write nonempty lists using this syntax):
(treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6]
 (((1*2)*(3*4))*(5*6))
(treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6,7,8]
 (((1*2)*(3*4))*((5*6)*(7*8)))
However, there are still cases where one branch will be much larger than its sibling. The fold fills a balanced binary tree from the left, but any leftover elements are put at the top level. In other words:
(treeFold (:*:) . fmap Leaf) [1..9]
 ((((1*2)*(3*4))*((5*6)*(7*8)))*9)
That 9
hanging out on its own there is a problem.
One observation we can make is that pairMap
always starts from the same side on each iteration, like a typewriter moving from one line to the next. This has the consequence of building up the leftovers on one side, leaving them until the top level.
We can improve the situation slightly by going back and forth, slalomstyle, so we consume leftovers on each iteration:
treeFold :: (a > a > a) > NonEmpty a > a
treeFold f = goTo where
goTo (y : []) = y
goTo (a : b : rest) = goFro (pairMap f (f a b) rest)
goFro (y : []) = y
goFro (a : b : rest) = goTo (pairMap (flip f) (f b a) rest)
pairMap f = go [] where
go ys y (a:b:rest) = go (y:ys) (f a b) rest
go ys y [z] = z : y : ys
go ys y [] = y : ys
Notice that we have to flip the combining function to make sure the ordering is the same on output. For the earlier example, this solves the issue:
(treeFold (:*:) . fmap Leaf) [1..9]
 (((1*2)*((3*4)*(5*6)))*((7*8)*9))
It does not build up the tree as balanced as it possibly could, though:
(treeFold (:*:) . fmap Leaf) [1,2,3,4,5,6]
 ((1*2)*((3*4)*(5*6)))
There’s four elements in the right branch, and two in the left in the above example. Three in each would be optimal.
Wait—optimal in what sense, exactly? What do we mean when we say one tree is more balanced than another? Let’s say the “balance factor” is the largest difference in size of two sibling trees:
balFac :: Tree a > Integer
balFac = fst . go where
go :: Tree a > (Integer, Integer)
go (Leaf _) = (0, 1)
go (l :*: r) = (lb `max` rb `max` abs (rs  ls), rs + ls) where
(lb,ls) = go l
(rb,rs) = go r
And one tree is more balanced than another if it has a smaller balance factor.
There’s effectively no limit on the balance factor for the typewriter method: when the input is one larger than a power of two, it’ll stick the one extra in one branch and the rest in another (as with [1..9]
in the example above).
For the slalom method, it looks like there’s something more interesting going on, limitwise. I haven’t been able to verify this formally (yet), but from what I can tell, a tree of height $n$ will have at most a balance factor of the $n$th Jacobsthal number. That’s (apparently) also the number of ways to tie a tie using $n + 2$ turns.
That was just gathered from some quick experiments and oeis.org, but it seems to make sense intuitively. Jacobsthal numbers are defined like this:
j 0 = 0
j 1 = 1
j n = j (n1) + 2 * j (n2)
So, at the top level, there’s the imbalance caused by the secondlast pairFold
, plus the imbalance caused by the thirdtolast. However, the thirdtolast imbalance is twice what it was at that level, because it is now working with an alreadypairedup list. Why isn’t the second last imbalance also doubled? Because it’s counteracted by the fact that we turned around: the imbalance is in an element that’s a leftover element. At least that’s what my intuition is at this point.
The minimum balance factor is, of course, one. Unfortunately, to achieve that, I lost some of the properties of the previous folds:
Up until now, I have been avoiding taking the length of the incoming list. It would lose a lot of laziness, cause an extra traversal, and generally seems like an ugly solution. Nonetheless, it gives the most balanced results I could find so far:
treeFold :: (a > a > a) > NonEmpty a > a
treeFold f (x:xs) = go (length (x:xs)) (x:xs) where
go 1 [y] = y
go n ys = f (go m a) (go (nm) b) where
(a,b) = splitAt m ys
m = n `div` 2
splitAt
is an inefficient operation, but if we let the lefthand call return its unused input from the list, we can avoid it:
treeFold :: (a > a > a) > NonEmpty a > a
treeFold f (x:xs) = fst (go (length (x:xs)) (x:xs)) where
go 1 (y:ys) = (y,ys)
go n ys = (f l r, rs) where
(l,ls) = go m ys
(r,rs) = go (nm) ls
m = n `div` 2
Finally, you may have spotted the state monad in this last version. We can make the similarity explicit:
treeFold :: (a > a > a) > NonEmpty a > a
treeFold f (x:xs) = evalState (go (length (x:xs))) (x:xs) where
go 1 = state (\(y:ys) > (y,ys))
go n = do
let m = n `div` 2
l < go m
r < go (nm)
return (f l r)
And there you have it: three different ways to fold in a more balanced way. Perhaps surprisingly, the first is the fastest in my tests. I’d love to hear if there’s a more balanced version (which is lazy, ideally) that is just as efficient as the first implementation.
I have found two other uses for these folds other than simply constructing more balanced binary trees. The first is summation of floatingpoint numbers. If you sum floatingpoint numbers in the usual way with foldl'
(or, indeed, with an accumulator in an imperative language), you will see an error growth of $\mathcal{O}(n)$, where $n$ is the number of floats you’re summing.
A wellknown solution to this problem is the Kahan summation algorithm. It carries with it a running compensation for accumulating errors, giving it $\mathcal{O}(1)$ error growth. There are two downsides to the algorithm: it takes four times the number of numerical operations to perform, and isn’t parallel.
For that reason, it’s often not used in practice: instead, floats are summed pairwise, in a manner often referred to as cascade summation. This is what’s used in NumPy. The error growth isn’t quite as good—$\mathcal{O}(\log{n})$—but it takes the exact same number of operations as normal summation. On top of that:
Dividing a fold into roughlyequal chunks is exactly the kind of problem encountered when trying to parallelize certain algorithms. Adapting the folds above so that their work is performed in parallel is surprisingly easy:
splitPar :: (a > a > a) > (Int > a) > (Int > a) > Int > a
splitPar f = go
where
go l r 0 = f (l 0) (r 0)
go l r n = lt `par` (rt `pseq` f lt rt)
where
lt = l (nm)
rt = r m
m = n `div` 2
treeFoldParallel :: (a > a > a) > NonEmpty a > a
treeFoldParallel f xs =
treeFold const (splitPar f) xs numCapabilities
The above will split the fold into numCapabilities
chunks, and perform each one in parallel. numCapabilities
is a constant defined in GHC.Conc: it’s the number of threads which can be run simultaneously at any one time. Alternatively, you could the function include a parameter for how many chunks to split the computation into. You could also have the fold adapt as it went, choosing whether or not to spark based on how many sparks exist at any given time:
parseq :: a > b > b
parseq a b =
runST
(bool (par a b) (seq a b) <$>
unsafeIOToST (liftA2 (>) numSparks getNumCapabilities))
treeFoldAdaptive :: (a > a > a) > a > [a] > a
treeFoldAdaptive f =
Lazy.treeFold
(\l r >
r `parseq` (l `parseq` f l r))
Adapted from this comment by Edward Kmett. This is actually the fastest version of all the folds.
All of this is provided in a library I’ve put up on Hackage.
]]>I have been working a little more on my semirings library recently, and I have come across some interesting functions in the process. First, a quick recap on the Semiring
class and some related functions:
class Semiring a where
one :: a
zero :: a
infixl 6 <+>
(<+>) :: a > a > a
infixl 7 <.>
(<.>) :: a > a > a
add :: (Foldable f, Semiring a) => f a > a
add = foldl' (<+>) zero
mul :: (Foldable f, Semiring a) => f a > a
mul = foldl' (<.>) one
instance Semiring Integer where
one = 1
zero = 0
(<+>) = (+)
(<.>) = (*)
instance Semiring Bool where
one = True
zero = False
(<+>) = ()
(<.>) = (&&)
You can think of it as a replacement for Num
, but it turns out to be much more generally useful than that.
The first interesting function is to do with matrix multiplication. Here’s the code for multiplying two matrices represented as nested lists:
mulMatrix :: Semiring a => [[a]] > [[a]] > [[a]]
mulMatrix xs ys = map (\row > map (add . zipWith (<.>) row) cs) xs
where
cs = transpose ys
One of the issues with this code (other than its woeful performance) is that it seems needlessly listspecific. zipWith
seems like the kind of thing that exists on a bunch of different structures. Indeed, the ZipList
wrapper uses zipWith
as its <*>
implementation. Let’s try that for now:
mulMatrix :: (Semiring a, Applicative f) => f (f a) > f (f a) > f (f a)
mulMatrix xs ys = fmap (\row > fmap (add . liftA2 (<.>) row) cs) xs
where
cs = transpose ys
Of course, now add
needs to work on our f
, so it should be Foldable
mulMatrix
:: (Semiring a, Applicative f, Foldable f)
=> f (f a) > f (f a) > f (f a)
mulMatrix = ...
transpose
is the missing piece now. A little bit of Applicative
magic can help us out again, though: sequenceA
is transpose
on ZipList
s (McBride and Paterson 2008).
mulMatrix
:: (Semiring a, Applicative f, Traversable f)
=> f (f a) > f (f a) > f (f a)
mulMatrix xs ys =
fmap (\row > fmap (add . liftA2 (<.>) row) cs) xs
where
cs = sequenceA ys
One further generalization: The two f
s don’t actually need to be the same:
mulMatrix
:: (Applicative n
,Traversable m
,Applicative m
,Applicative p
,Semiring a)
=> n (m a) > m (p a) > n (p a)
mulMatrix xs ys = fmap (\row > fmap (add . liftA2 (<.>) row) cs) xs
where
cs = sequenceA ys
Happily, the way that the wrappers (n
, m
, and p
) match up coincides precisely with how matrix dimensions match up in matrix multiplication. Quoting from the Wikipedia definition:
if $A$ is an $n \times m$ matrix and $B$ is an $m \times p$ matrix, their matrix product $AB$ is an $n \times p$ matrix
This function is present in the linear package with some different constraints. In fairness, Applicative
probably isn’t the best thing to use here since it doesn’t work for so many instances (MonadZip
or something similar may be more suitable), but it’s very handy to have, and works outof the box for types like:
data Three a
= Three a a a
deriving (Functor, Foldable, Traversable, Eq, Ord, Show)
instance Applicative Three where
pure x = Three x x x
Three fx fy fz <*> Three xx xy xz = Three (fx xx) (fy xy) (fz xz)
Which makes it (to my mind) useful enough to keep. Also, it hugely simplified the code for matrix multiplication in square matrices I had, from Okasaki (1999).
If you’re putting a general class in a library that you want people to use, and there exist sensible instances for common Haskell types, you should probably provide those instances in the library to avoid orphans. The meaning of “sensible” here is vague: generally speaking, if there is only one obvious or clear instance, then it’s sensible. For a list instance for the semiring class, for instance, I could figure out several lawabiding definitions for <+>
, one
and zero
, but only one for <.>
: polynomial multiplication. You know, where you multiply two polynomials like so:
$(x^3 + 2x + 3)(5x + 3x^2 + 4) = 9x^5 + 15x^4 + 18x^3 + 28x^2 + 38x + 24$
A more general definition looks something like this:
$(a_0x^0 + a_1x^1 + a_2x^2)(b_0x^0 + b_1x^1 + b_2x^2) =$ $a_0b_0x^0 + (a_0b_1 + a_1b_0)x^1 + (a_0b_2 + a_1b_1 + a_2b_0)x^2 + (a_1b_2 + a_2b_1)x^3 + a_2b_2x^4$
Or, fully generalized:
$c_k = a_0b_k + a_1b_{k1} + \ldots + a_{k1}b_1 + a_kb_0$ $f(x) \times g(x) = \sum_{i=0}^{n+m}c_ix^i$
So it turns out that you can represent polynomials pretty elegantly as lists. Take an example from above:
$x^3 + 2x + 3$
And rearrange it in order of the powers of $x$:
$3x^0 + 2x^1 + x^3$
And fill in missing coefficients:
$3x^0 + 2x^1 + 0x^2 + 1x^3$
And then the list representation of that polynomial is the list of those coefficients:
[3, 2, 0, 1]
For me, the definitions of multiplication above were pretty hard to understand. In Haskell, however, the definition is quite beautiful:
instance Semiring a => Semiring [a] where
one = [one]
zero = []
[] <+> ys = ys
xs <+> [] = xs
(x:xs) <+> (y:ys) = x <+> y : (xs <+> ys)
_ <.> [] = []
[] <.> _ = []
(x:xs) <.> (y:ys) = (x<.>y) : map (x<.>) ys <+> xs <.> (y:ys)
This definition for <.>
can be found on page 4 of McIlroy (1999). Although there was a version of the paper with a slightly different definition:
_ <.> [] = []
[] <.> _ = []
(x:xs) <.> (y:ys)
= (x<.>y) : (map (x<.>) ys <+> map (<.>y) xs <+> (zero : (xs <.> ys)))
Similar to one which appeared in Dolan (2013).
As it happens, I prefer the first definition. It’s shorter, and I figured out how to write it as a fold:
_ <.> [] = []
xs <.> ys = foldr f [] xs where
f x zs = map (x <.>) ys <+> (zero : zs)
And if you inline the <+>
, you get a reasonable speedup:
xs <.> ys = foldr f [] xs
where
f x zs = foldr (g x) id ys (zero : zs)
g x y a (z:zs) = x <.> y <+> z : a zs
g x y a [] = x <.> y : a []
The definition of <+>
can also use a fold on either side for fusion purposes:
(<+>) = foldr f id where
f x xs (y:ys) = x <+> y : xs ys
f x xs [] = x : xs []
(<+>) = flip (foldr f id) where
f y ys (x:xs) = x <+> y : ys xs
f y ys [] = y : ys []
There are rules in the library to choose one of the above definitions if fusion is available.
This definition is much more widely useful than it may seem at first. Say, for instance, you wanted to search through pairs of things from two infinite lists. You can’t use the normal way to pair things for lists, the Cartesian product, because it will diverge:
[(x,y)  x < [1..], y < [1..]]
 [(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10)...
You’ll never get beyond 1 in the first list. Zipping isn’t an option either, because you won’t really explore the search space, only corresponding pairs. Brent Yorgey showed that if you want a list like this:
[(y,xy)  x < [0..], y < [0..x] ]
 [(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)...
Then what you’re looking for is a convolution (the same thing as polynomial multiplication). <.>
above can be adapted readily:
convolve :: [a] > [b] > [[(a,b)]]
convolve xs ys = foldr f [] xs
where
f x zs = foldr (g x) id ys ([] : zs)
g x y a (z:zs) = ((x, y) : z) : a zs
g x y a [] = [(x, y)] : a []
Flatten out this result to get your ordering. This convolution is a little different from the one in the blog post. By inlining <+>
we can avoid the expensive ++
function, without using difference lists.
Here’s another cool use of lists as polynomials: they can be used as a positional numeral system. Most common numeral systems are positional, including Arabic (the system you most likely use, where twentyfour is written as 24) and binary. Nonpositional systems are things like Roman numerals. Looking at the Arabic system for now, we see that the way of writing down numbers:
$1989$
Can be thought of the sum of each digit multiplied by ten to the power of its position:
$1989 = 1 \times 10^3 \plus 9 \times 10^2 \plus 8 \times 10^1 \plus 9 \times 10^0$ $1989 = 1 \times 1000 \plus 9 \times 100 \plus 8 \times 10 \plus 9 \times 1$ $1989 = 1000 \plus 900 \plus 80 \plus 9$ $1989 = 1989$
Where the positions are numbered from the right. In other words, it’s our polynomial list from above in reverse. As well as that, the convolution is long multiplication.
Now, taking this straight off we can try some examples:
 12 + 15 = 27
[2, 1] <+> [5, 1] == [7, 2]
 23 * 2 = 46
[3, 2] <.> [2] == [6, 4]
The issue, of course, is that we’re not handling carrying properly:
[6] <+> [6] == [12]
No matter: we can perform all the carries after the addition, and everything works out fine:
carry
:: Integral a
=> a > [a] > [a]
carry base xs = foldr f (toBase base) xs 0
where
f e a cin = r : a q where
(q,r) = quotRem (cin + e) base
toBase :: Integral a => a > a > [a]
toBase base = unfoldr f where
f 0 = Nothing
f n = Just (swap (quotRem n base))
Wrap the whole thing in a newtype and we can have a Num
instance:
newtype Positional
= Positional
{ withBase :: Integer > [Integer]
}
instance Num Positional where
Positional x + Positional y = Positional (carry <*> x <+> y)
Positional x * Positional y = Positional (carry <*> x <.> y)
fromInteger m = Positional (\base > toBase base m)
abs = id
signum = id
negate = id
toDigits :: Integer > Positional > [Integer]
toDigits base p = reverse (withBase p base)
This also lets us choose our base after the fact:
sumHundred = (sum . map fromInteger) [1..100]
toDigits 10 sumHundred
 [5,0,5,0]
toDigits 2 sumHundred
 [1,0,0,1,1,1,0,1,1,1,0,1,0]
All the handoptimizing, inlining, and fusion magic in the world won’t make a listbased implementation of convolution faster than a proper one on vectors, unfortunately. In particular, for larger vectors, a fast Fourier transform can be used. Also, usually code like this will be parallelized, rather than sequential. That said, it can be helpful to implement the slower version on vectors, in the usual indexed way, for comparison’s sake:
instance Semiring a =>
Semiring (Vector a) where
one = Vector.singleton one
zero = Vector.empty
xs <+> ys =
case compare (Vector.length xs) (Vector.length ys) of
EQ > Vector.zipWith (<+>) xs ys
LT > Vector.unsafeAccumulate (<+>) ys (Vector.indexed xs)
GT > Vector.unsafeAccumulate (<+>) xs (Vector.indexed ys)
signal <.> kernel
 Vector.null signal = Vector.empty
 Vector.null kernel = Vector.empty
 otherwise = Vector.generate (slen + klen  1) f
where
f n =
foldl'
(\a k >
a <+>
Vector.unsafeIndex signal k <.>
Vector.unsafeIndex kernel (n  k))
zero
[kmin .. kmax]
where
!kmin = max 0 (n  (klen  1))
!kmax = min n (slen  1)
!slen = Vector.length signal
!klen = Vector.length kernel
As has been observed before (Rivas, Jaskelioff, and Schrijvers 2015) there’s a pretty suggestive similarity between semirings and the Applicative
/Alternative
classes in Haskell:
class Semiring a where
one :: a
zero :: a
(<+>) :: a > a > a
(<.>) :: a > a > a
class Applicative f where
pure :: a > f a
(<*>) :: f (a > b) > f a > f b
class Alternative f where
empty :: f a
(<>) :: f a > f a > f a
So can our implementation of convolution be used to implement the methods for these classes? Partially:
newtype Search f a = Search { runSearch :: [f a] }
instance Functor f => Functor (Search f) where
fmap f (Search xs) = Search ((fmap.fmap) f xs)
instance Alternative f => Applicative (Search f) where
pure x = Search [pure x]
_ <*> Search [] = Search []
Search xs <*> Search ys = Search (foldr f [] xs) where
f x zs = foldr (g x) id ys (empty : zs)
g x y a (z:zs) = (x <*> y <> z) : a zs
g x y a [] = (x <*> y) : a []
instance Alternative f => Alternative (Search f) where
Search xs <> Search ys = Search (go xs ys) where
go [] ys = ys
go xs [] = xs
go (x:xs) (y:ys) = (x <> y) : go xs ys
empty = Search []
At first, this seems perfect: the types all match up, and the definitions seem sensible. The issue is with the laws: Applicative
and Alternative
are missing four that semirings require. In particular: commutativity of plus, annihilation by zero, and distributivity left and right:
xs <> ys = ys <> xs
empty <*> xs = fs <*> empty = empty
fs <*> (xs <> ys) = fs <*> xs <> fs <*> ys
(fs <> gs) <*> xs = fs <*> xs <> gs <*> ys
The vast majority of the instances of Alternative
today fail one or more of these laws. Taking lists as an example, ++
obviously isn’t commutative, and <*>
only distributes when it’s on the right.
What’s the problem, though? Polynomial multiplication follows more laws than those required by Applicative
: why should that worry us? Unfortunately, in order for multiplication to follow those laws, it actually relies on the underlying semiring being lawabiding. And it fails the applicative laws when it isn’t.
There are two angles from which we could come at this problem: either we relax the semiring laws and try and make our implementation of convolution rely on them as little as possible, or we find Alternative
instances which follow the semiring laws. Or we could meet in the middle, relaxing the laws as much as possible until we find some Alternative
s that meet our standards.
This has actually been accomplished in several papers: the previously mentioned Rivas, Jaskelioff, and Schrijvers (2015) discusses nearsemirings, defined as semiringlike structures with associativity, identity, and these two laws:
$0 \times x = 0$ $(x \plus y) \times z = (x \times z) \plus (y \times z)$
In contrast to normal semirings, zero only annihilates when it’s on the left, and multiplication only distributes over addition when it’s on the right. Addition is not required to be commutative.
The lovely paper Spivey (2009) has a similar concept: a “bunch”.
class Bunch m where
return :: a > m a
(>>=) :: m a > (a > m b) > m b
zero :: m a
(<>) :: m a > m a > m a
wrap :: m a > m a
The laws are all the same (with <*>
implemented in terms of >>=
), and the extra wrap
operation can be expressed like so:
wrap :: Alternative f => Search f a > Search f a
wrap (Search xs) = Search (empty : xs)
A definition of >>=
for our polynomials is also provided:
[] >>= _ = []
(x:xs) >>= f = foldr (<>) empty (fmap f x) <> wrap (xs >>= f)
This will require the underlying f
to be Foldable
. We can inline a little, and express the whole thing as a fold:
instance (Foldable f, Alternative f) => Monad (Search f) where
Search xs >>= k = foldr f empty xs where
f e a = foldr ((<>) . k) (wrap a) e
For Search
to meet the requirements of a bunch, the paper notes that the f
must be assumed to be a bag, i.e., the order of its elements must be ignored.
Kiselyov et al. (2005) kind of goes the other direction, defining a monad which has fair disjunction and conjunction. Unfortunately, the fair conjunction loses associativity.
The end of the paper on algebras for combinatorial search wonders if notions of distance could be added to some of the algebras. I think that should be as simple as supplying a suitable nearsemiring for f
, but the definition of >>=
would need to be changed. The nearsemiring I had in mind was the probability monad. It works correctly if inlined:
newtype Search s a = Search { runSearch :: [[(a,s)]] }
instance Functor (Search s) where
fmap f (Search xs) = Search ((fmap.fmap.first) f xs)
instance Semiring s => Applicative (Search s) where
pure x = Search [[(x,one)]]
_ <*> Search [] = Search []
Search xs <*> Search ys = Search (foldr f [] xs) where
f x zs = foldr (g x) id ys (empty : zs)
g x y a (z:zs) = (m x y ++ z) : a zs
g x y a [] = (m x y) : a []
m ls rs = [(l r, lp<.>rp)  (l,lp) < ls, (r,rp) < rs]
instance Semiring s => Alternative (Search s) where
Search xs <> Search ys = Search (go xs ys) where
go [] ys = ys
go xs [] = xs
go (x:xs) (y:ys) = (x ++ y) : go xs ys
empty = Search []
wrap :: Search s a > Search s a
wrap (Search xs) = Search ([] : xs)
instance Semiring s => Monad (Search s) where
Search xs >>= k = foldr f empty xs where
f e a = foldr ((<>) . uncurry (mulIn . k)) (wrap a) e
mulIn (Search x) xp = Search ((fmap.fmap.fmap) (xp<.>) x)
But I couldn’t figure out how to get it to work for a more generalized inner monad. The above could probably be sped up, or randomized, using the many wellknown techniques for probability monad optimization.
Dolan, Stephen. 2013. “Fun with semirings: A functional pearl on the abuse of linear algebra.” In, 48:101. ACM Press. doi:10.1145/2500365.2500613. https://www.cl.cam.ac.uk/~sd601/papers/semirings.pdf.
Kiselyov, Oleg, Chungchieh Shan, Daniel P Friedman, and Amr Sabry. 2005. “Backtracking, interleaving, and terminating monad transformers (functional pearl).” ACM SIGPLAN Notices 40 (9): 192–203. http://okmij.org/ftp/Computation/monads.html#LogicT.
McBride, Conor, and Ross Paterson. 2008. “Applicative programming with effects.” Journal of functional programming 18 (01): 1–13. http://strictlypositive.org/Idiom.pdf.
McIlroy, M. Douglas. 1999. “Power Series, Power Serious.” J. Funct. Program. 9 (3) (May): 325–337. doi:10.1017/S0956796899003299. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.333.3156&rep=rep1&type=pdf.
Okasaki, Chris. 1999. “From Fast Exponentiation to Square Matrices: An Adventure in Types.” In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’99), Paris, France, September 2729, 1999, 34:28. ACM. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.357&rep=rep1&type=pdf.
Rivas, Exequiel, Mauro Jaskelioff, and Tom Schrijvers. 2015. “From monoids to nearsemirings: The essence of MonadPlus and Alternative.” In Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming, 196–207. ACM. doi:10.1145/2790449.2790514. http://www.fceia.unr.edu.ar/~mauro/pubs/FromMonoidstoNearsemirings.pdf.
Spivey, J. Michael. 2009. “Algebras for combinatorial search.” Journal of Functional Programming 19 (34) (July): 469–487. doi:10.1017/S0956796809007321. https://pdfs.semanticscholar.org/db3e/373bb6e7e7837ebc524da0a25903958554ed.pdf.