Posted on October 17, 2022

Here’s a cool trick:

minimum :: Ord a => [a] -> a
minimum = head . sort

This is $\mathcal{O}(n)$ in Haskell, not $\mathcal{O}(n \log n)$ as you might expect. And this isn’t because Haskell is using some weird linear-time sorting algorithm; indeed, the following is $\mathcal{O}(n \log n)$:

maximum :: Ord a => [a] -> a
maximum = last . sort

No: since the implementation of minimum above only demands the first element of the list, and since sort has been carefully implemented, only a linear amount of work will be done to retrieve it.

It’s not easy to structure programs to have the same property as sort does above: to be maximally lazy, such that unnecessary work is not performed. Today I was working on a maximally lazy implementation of the following program:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn = ...

>>> groupOn (rem 2) [1..5]
[(1,[1,3,5]),(0,[2,4])]

>>> groupOn (rem 3) [5,8,3,6,2]
[(2,[5,8,2]),(0,[3,6])]

This function groups the elements of a list according to some key function. The desired behaviour here is a little subtle: we don’t want to just group adjacent elements, for instance.

groupOn (rem 3) [5,8,3,6,2] ≢ [(2,[5,8]),(0,[3,6]),(2,[2])]

And we don’t want to reorder the elements of the list by the keys:

groupOn (rem 3) [5,8,3,6,2] ≢ [(0,[3,6]),(2,[5,8,2])]

These constraints make it especially tricky to make this function lazy. In fact, at first glance, it seems impossible. What should, for instance, groupOn id [1..] return? It can’t even fill out the first group, since it will never find another 1. However, it can fill out the first key. And, in fact, the second. And it can fill out the first element of the first group. Precisely:

groupOn id [1..] ≡ [(1,1:⊥), (2,2:⊥), (3,3:⊥), ...

Another example is groupOn id (repeat 1), or groupOn id (cycle [1,2,3]). These each have partially-defined answers:

groupOn id (repeat 1)      ≡ (1,repeat 1):⊥

groupOn id (cycle [1,2,3]) ≡ (1,repeat 1):(2,repeat 2):(3,repeat 3):⊥

So there is some kind of well-defined lazy semantics for this function. The puzzle I was interested in was defining an efficient implementation for these semantics.

# The Slow Case

The first approximation to a solution I could think of is the following:

groupOn :: Ord k => (a -> k) -> [a] -> [(k, [a])]
groupOn k = Map.toList . Map.fromListWith (++) . map (\x -> (k x, [x]))

In fact, if you don’t care about laziness, this is probably the best solution: it’s $\mathcal{O}(n \log n)$, it performs well (practically as well as asymptotically), and it has the expected results.

However, there are problems. Primarily this solution cares about ordering, which we don’t want. We want to emit the results in the same order that they were in the original list, and we don’t necessarily want to require an ordering on the elements (for the efficient solution we will relax this last constraint).

Instead, let’s implement our own “map” type that is inefficient, but more general.

type Map a b = [(a,b)]

insertWith :: Eq a => (b -> b -> b) -> a -> b -> Map a b -> Map a b
insertWith f k v [] = [(k,v)]
insertWith f k v ((k',v'):xs)
| k == k'   = (k',f v v') : xs
| otherwise = (k',v') : insertWith f k v xs

groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])]
groupOn k = foldr (uncurry (insertWith (++))) [] . map (\x -> (k x, [x]))

The problem here is that it’s not lazy enough. insertWith is strict in its last argument, which means that using foldr doesn’t gain us anything laziness-wise.

There is some extra information we can use to drive the result: we know that the result will have keys that are in the same order as they appear in the list, with duplicates removed:

groupOn :: Eq k => (a -> k) -> [a] -> [(k, [a])]
groupOn k xs = map _ ks
where
ks = map k xs

From here, we can get what the values should be from each key by filtering the original list:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn key xs = map (\k -> (k, filter ((k==) . key) xs)) (nub (map key xs))

Using a kind of Schwartzian transform yields the following slight improvement:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn key xs = map (\k -> (k , map snd (filter ((k==) . fst) ks))) (nub (map fst ks))
where
ks = map (\x -> (key x, x)) xs

But this traverses the same list multiple times unnecessarily. The problem is that we’re repeating a lot of work between nub and the rest of the algorithm.

The following is much better:

groupOn :: Eq k => (a -> k) -> [a] -> [(k,[a])]
groupOn key = go . map (\x -> (key x, x))
where
go [] = []
go ((k,x):xs) = (k,x:map snd y) : go ys
where
(y,ys) = partition ((k==).fst) xs

First, we perform the Schwartzian transform optimisation. The work of the algorithm is done in the go helper. The idea is to filter out duplicates as we encounter them: when we encounter (k,x) we can keep it immediately, but then we split the rest of the list into the components that have the same key as this element, and the ones that differ. The ones that have the same key can form the collection for this key, and those that differ are what we recurse on.

This partitioning also avoids re-traversing elements we know to be already accounted for in a previous group. I think that this is the most efficient (modulo some inlining and strictness improvements) algorithm that can do groupOn with just an Eq constraint.

# A Faster Version

The reason that the groupOn above is slow is that every element returned has to traverse the entire rest of the list to remove duplicates. This is a classic pattern of quadratic behaviour: we can improve it by using the same trick as quick sort, by partitioning the list into lesser and greater elements on every call.

groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
groupOnOrd key = go . map (\x -> (key x, x))
where
go [] = []
go ((k,x):xs) = (k,x:e) : go lt ++ go gt
where
(e,lt,gt) = foldr split ([],[],[]) xs
split ky@(k',y) ~(e,lt,gt) = case compare k' k of
LT -> (e, ky:lt, gt)
EQ -> (y:e, lt, gt)
GT -> (e, lt, ky:gt)

While this is $\mathcal{O}(n \log n)$, and it does group elements, it also reorders the underlying list. Let’s fix that by tagging the incoming elements with their positions, and then using those positions to order them back into their original configuration:

groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
groupOnOrd k = map (\(_,k,xs) -> (k,xs)) . go . zipWith (\i x -> (i, k x, x)) [0..]
where
go [] = []
go ((i, k, x):xs) = (i, k, x : e) : merge (go l) (go g)
where
(e, l, g) = foldr split ([],[],[]) xs

split ky@(_,k',y) ~(e, l, g) = case compare k' k of
LT -> (e  , ky : l,      g)
EQ -> (y:e,      l,      g)
GT -> (e  ,      l, ky : g)

merge [] gt = gt
merge lt [] = lt
merge (l@(i,_,_):lt) (g@(j,_,_):gt)
| i <= j    = l : merge lt (g:gt)
| otherwise = g : merge (l:lt) gt

This is close, but still not right. This isn’t yet lazy. The merge function is strict in both arguments.

However, we have all the information we need to unshuffle the lists without having to inspect them. In split, we know which direction we put each element: we can store that info without using indices.

groupOnOrd :: Ord k => (a -> k) -> [a] -> [(k,[a])]
groupOnOrd k = catMaybes . go . map (\x -> (k x, x))
where
go [] = []
go ((k,x):xs) = Just (k, x : e) : merge m (go l) (go g)
where
(e, m, l, g) = foldr split ([],[],[],[]) xs

split ky@(k',y) ~(e, m, l, g) = case compare k' k of
LT -> (  e, LT : m, ky : l,      g)
EQ -> (y:e, EQ : m,      l,      g)
GT -> (  e, GT : m,      l, ky : g)

merge []        lt     gt     = []
merge (EQ : xs) lt     gt     = Nothing : merge xs lt gt
merge (LT : xs) (l:lt) gt     = l       : merge xs lt gt
merge (GT : xs) lt     (g:gt) = g       : merge xs lt gt

What we generate here is a [Ordering]: this list tells us the result of all the compare operations on the input list. Then, in merge, we invert the action of split, rebuilding the original list without inspecting either lt or gt.

And this solution works! It’s $\mathcal{O}(n \log n)$, and fully lazy.

>>> map fst . groupOnOrd id $[1..] [1..] >>> groupOnOrd id$ cycle [1,2,3]
(1,repeat 1):(2,repeat 2):(3,repeat 3):⊥

>>> groupOnOrd (rem 3) [1..]
(1,[1,4..]):(2,[2,5..]):(0,[3,6..]):⊥

The finished version of these two functions, along with some benchmarks, is available here.