A Binomial Urn

Posted on January 15, 2019
Part 3 of a 3-part series on Balanced Folds

When we started the series, we wanted to find a “better” fold: one that was more balanced than either foldl or foldr (in its placement of parentheses). Both of these are about as unbalanced as you can get:

>>> foldr (+) 0 [1,2,3]
1 + (2 + (3 + 0))
>>> foldl (+) 0 [1,2,3]
((0 + 1) + 2) + 3

The first better fold I found was Jon Fairbairn’s simple treeFold:

treeFold :: (a -> a -> a) -> a -> [a] -> a
treeFold f = go
where
go x [] = x
go a  (b:l) = go (f a b) (pairMap l)
pairMap (x:y:rest) = f x y : pairMap rest
pairMap xs = xs

>>> treeFold (+) 0 [1,2,3]
(0 + 1) + (2 + 3)

Already this function was kind of magical: if your binary operator merges two sorted lists, foldr will give you insertion sort, whereas treeFold will give you merge sort; for summing floats, treeFold has a lower error growth than sum. By dividing up the work better, we were able to improve the characteristics of many algorithms automatically. We also saw that it could easily be made parallel:

parseq :: a -> b -> b
parseq a b =
runST
(bool (par a b) (seq a b) <\$>
unsafeIOToST (liftA2 (>) numSparks getNumCapabilities))

treeFoldParallel :: (a -> a -> a) -> a -> [a] -> a
treeFoldParallel f =
treeFold
(\l r ->
r parseq (l parseq f l r))

In the next post, we saw how we could make the fold incremental, by using binary number representations for data structures. This let us do 2 things: it meant the fold was structurally terminating, so it would pass the termination checker (efficiently) in languages like Agda or Idris, and it meant we could write scanl using the fold. The scanl was also efficient: you could run the fold at any point in $\mathcal{O}(\log n)$ time, and work would be shared between subsequent runs. Effectively, this let us use it to solve greedy optimization problems. We also saw how it was effectively constructing an implicit binomial priority queue under the hood, and how it exploited laziness to get sharing.

I’ve gotten huge mileage out of this fold and the general ideas about it, and today I’m going to show one more use of it. We’re going to improve some of the asymptotics of the data structure presented in Lampropoulos, Spector-Zabusky, and Foner (2017).

A Random Urn

The paper opens with the problem:

Suppose you have an urn containing two red balls, four green balls, and three blue balls. If you take three balls out of the urn, what is the probability that two of them are green?

If you were to take just one ball out of the earn, calculating the associated probabilities would be easy. Once you get to the second, though, you have to update the previous probability based on what ball was removed. In other words, we need to be able to dynamically update the distribution.

Using lists, this would obviously become an $\mathcal{O}(n)$ operation. In the paper, an almost-perfect binary tree is used. This turns the operation into one that’s $\mathcal{O}(\log n)$. The rest of the operations have the following complexities:

Operation Complexity
insert $\mathcal{O}(\log n)$
remove $\mathcal{O}(\log n)$
fromList $\mathcal{O}(n)$

As a quick spoiler, the improved version presented here has these complexities:

Operation Complexity
insert $\mathcal{O}(1)$
remove $\mathcal{O}(\log n)$
merge $\mathcal{O}(\log n)$
fromList $\mathcal{O}(n)$

We add another operation (merge), which means that the new structure is viable as an instance of Alternative, Monad, and so on, making it an efficient monad for weighted backtracking search.

Priority Queues

The key thing to notice in the paper which will let us improve the structure is that what they’re designing is actually a priority queue. Well, a weird looking priority queue, but a priority queue nonetheless.

Think about it like a max-priority queue (pop returns the largest element first), with a degree of “randomization”. In other words, when you go to do a pop, all of the comparisons between the ordering keys (the weights in this case) sprinkles some randomness into the equation, meaning that instead of 1 < 2 returning True, it returns True $\frac{2}{3}$ of the time, and False the other $\frac{1}{3}$.

This way of doing things means that not every priority queue is suitable: we want to run comparisons at pop time (not insert), so a binary heap (for instance) won’t do. At branches (non-leaves), the queue will only be allowed store summaries of the data, not the “max element”.

The one presented in the paper is something like a Braun priority queue: the $\mathcal{O}(n)$ fromList implementation is reminiscent of the one in Okasaki (1997).

So what priority queue can we choose to get us the desired efficiency? Why, a binomial one of course!

The Data Structure

The urn structure itself looks a lot like a binomial heap:

data Tree a
= Tree
{ weight :: {-# UNPACK #-} !Word
, branch :: Node a
}

data Node a
= Leaf a
| Branch (Tree a) (Node a)

data Heap a
= Nil
| Cons {-# UNPACK #-} !Word (Tree a) (Heap a)

data Urn a =
Urn {-# UNPACK #-} !Word
!(Heap a)

By avoiding the usual Skip constructors you often see in a binomial heap we save a huge amount of space. Instead, we store the “number of zeroes before this bit”. Another thing to point out is that only left branches in the trees store their weight: the same optimization is made in the paper.

Insertion is not much different from insertion for a usual binomial priority queue, although we don’t need to do anything to merge the trees:

insertHeap :: Word -> a -> Heap a -> Heap a
insertHeap i' x' = go 0 (Tree i' (Leaf x'))
where
go !i x Nil = Cons i x Nil
go !i x (Cons 0 y ys) = go (i+1) (mergeTree x y) ys
go !i x (Cons j y ys) = Cons i x (Cons (j-1) y ys)

mergeTree :: Tree a -> Tree a -> Tree a
mergeTree xs ys =
Tree
(weight xs + weight ys)
(Branch xs (branch ys))

insert :: Word -> a -> Urn a -> Urn a
insert i x (Urn w xs) = Urn (w+i) (insertHeap i x xs)

We could potentially get insertion from amortized $\mathcal{O}(1)$ to worst-case $\mathcal{O}(1)$ by using skew binary instead of binary (in fact I am almost sure it’s possible), but then I think we’d lose the efficient merge. I’ll leave exploring that for another day.

To get randomness, we’ll write a very simple class that encapsulates only what we need:

class Sample m where
-- | Inclusive range
inRange :: Word -> Word -> m Word

You can later instantiate this to whatever random monad you end up using. (The same approach was taken in the paper, although we only require Functor here, not Monad).

Sampling (with replacement) first randomly chooses a tree from the top-level list, and then we drill down into that tree with binary search.

sample :: (Functor m, Sample m) => Urn a -> Maybe (m a)
sample (Urn _ Nil) = Nothing
sample (Urn w' (Cons _ x' xs')) = Just (fmap (go x' xs') (inRange 0 (w' - 1)))
where
go x Nil !w = go' w (branch x)
go x (Cons _ y ys) !w
| w < weight x = go' w (branch x)
| otherwise    = go y ys (w - weight x)
go' !_ (Leaf x) = x
go' !i (Branch xs ys)
| i < weight xs = go' i (branch xs)
| otherwise = go' (i - weight xs) ys

So we’re off to a good start, but remove is a complex operation. We take the same route taken in the paper: first, we perform an “uncons”-like operation, which pops out the last inserted element. Then, we randomly choose a point in the tree (using the same logic as in sample), and replace it with the popped element1.

remove :: (Functor m, Sample m) => Urn a -> Maybe (m ((a, Word), Urn a))
remove (Urn w hp) = fmap go' (Heap.uninsert hp)
where
go' (vw,v,hp') = fmap (go hp') (inRange 0 (w-1))
where
go !_  Nil = ((v, vw), Urn 0 Nil)
go !rw vs@(Cons i' x' xs')
| rw < vw = ((v, vw), Urn (w - vw) vs)
| otherwise = replace (rw - vw) i' x' xs'
(\ys yw y -> ((y, yw), Urn (w - yw) ys))

replace !rw i x Nil k = replaceTree rw x (\t -> k (Cons i t Nil))
replace !rw i x xs@(Cons j y ys) k
| rw < weight x = replaceTree rw x (\t -> k (Cons i t xs))
| otherwise = replace (rw - weight x) j y ys (k . Cons i x)

replaceTree !_  (Tree tw (Leaf x)) k = k (Tree vw (Leaf v)) tw x
replaceTree !rw (Tree tw (Branch xs ys)) k
| rw < weight xs = replaceTree rw xs
(\t -> k (Tree (tw + (weight t - weight xs)) (Branch t ys)))
| otherwise = replaceTree (rw - weight xs)
(Tree (tw - weight xs) ys)
(\t -> k (Tree (weight xs + weight t) (Branch xs (branch t))))

Merge is the same as on binomial heaps:

mergeHeap :: Heap a -> Heap a -> Heap a
mergeHeap Nil = id
mergeHeap (Cons i' x' xs') = merger i' x' xs'
where
merger !i x xs Nil = Cons i x xs
merger !i x xs (Cons j y ys) = merge' i x xs j y ys

merge' !i x xs !j y ys = case compare i j of
LT -> Cons i x (merger (j-i-1) y ys xs)
GT -> Cons j y (merger (i-j-1) x xs ys)
EQ -> mergec (succ i) (mergeTree x y) xs ys

mergec !p !t Nil = carryLonger p t
mergec !p !t (Cons i x xs) = mergecr p t i x xs

mergecr !p !t !i x xs Nil = carryLonger' p t i x xs
mergecr !p !t !i x xs (Cons j y ys) = mergec' p t i x xs j y ys

mergec' !p t !i x xs !j y ys = case compare i j of
LT -> mergecr'' p t i x xs (j-i-1) y ys
GT -> mergecr'' p t j y ys (i-j-1) x xs
EQ -> Cons p t (mergec i (mergeTree x y) xs ys)

mergecr'' !p !t  0 x xs !j y ys = mergecr (p+1) (mergeTree t x) j y ys xs
mergecr'' !p !t !i x xs !j y ys = Cons p t (Cons (i-1) x (merger j y ys xs))

carryLonger !i !t Nil = Cons i t Nil
carryLonger !i !t (Cons j y ys) = carryLonger' i t j y ys

carryLonger' !i !t  0 y ys = carryLonger (succ i) (mergeTree t y) ys
carryLonger' !i !t !j y ys = Cons i t (Cons (j-1) y ys)

merge :: Urn a -> Urn a -> Urn a
merge (Urn i xs) (Urn j ys) = Urn (i+j) (mergeHeap xs ys)   

Finger Trees

Again, the cleverness of all the tree folds is that they intelligently batch summarizing operations, allowing you to efficiently so prefix-scan-like operations that exploit sharing.

The bare-bones version just uses binary numbers: you can upgrade the cons operation to worst-case constant-time if you use skew binary. Are there other optimizations? Yes! What if we wanted to stick something on to the other end, for instance? What if we wanted to reverse?

If you figure out a way to do all these optimizations, and put them into one big data structure, you get the mother-of-all “batching” data structures: the finger tree. This is the basis for Haskell’s Data.Sequence, but it can also implement priority queues, urns (I’d imagine), fenwick-tree-like structures, and more.

Uses and Further Work

First and foremost, I should test the above implementations! I’m pretty confident the asymptotics are correct, but I’m certain the implementations have bugs.

The efficient merge is intriguing: it means that Urn could conceivably be Alternative, MonadPlus, etc. I have yet to see a use for that, but it’s interesting nonetheless! I’m constantly looking for a way to express something like Dijkstra’s algorithm algebraicly, using the usual Alternative combinators; I don’t know if this is related.

The other interesting point is that, for this to be an instance of Applicative, it would need some analogue for multiplication for the weights. I’m not sure what that should be.

This is inherently max-priority. It’s not obvious how to translate what we have into a min-priority queue version.

Finally, it might be worth trying out different priority queues (a pairing heap is very similar in structure to this). Also, we could rearrange the weights so that larger ones are higher in each tree: this might give a performance boost.

Lampropoulos, Leonidas, Antal Spector-Zabusky, and Kenneth Foner. 2017. “Ode on a random urn (functional pearl).” In, 26–37. ACM Press. doi:10.1145/3122955.3122959.

Okasaki, Chris. 1997. “Three Algorithms on Braun Trees.” Journal of Functional Programming 7 (6) (November): 661–666. doi:10.1017/S0956796897002876.

1. There’s one extra step I haven’t mentioned: we also must allow the first element (the last inserted) to be chosen, so we run the random-number generator once to check if that’s the element we want to choose.