Some Tricks for List Manipulation
This post is a collection of some of the tricks I’ve learned for manipulating lists in Haskell. Each one starts with a puzzle: you should try the puzzle yourself before seeing the solution!
The Tortoise and the Hare
How can you split a list in half, in one pass, without taking its length?
This first one is a relatively well-known trick, but it occasionally comes in handy, so I thought I’d mention it. The naive way is as follows:
= splitAt (length xs `div` 2) xs splitHalf xs
But it’s unsatisfying: we have to traverse the list twice, and we’re taking its length (which is almost always a bad idea). Instead, we use the following function:
splitHalf :: [a] -> ([a],[a])
= go xs xs
splitHalf xs where
:ys) (_:_:zs) = first (y:) (go ys zs)
go (y= ([],ys) go ys _
The “tortoise and the hare” is the two arguments to go
:
it traverses the second one twice as fast, so when it hits the end, we
know that the first list must be halfway done.
There and Back Again
Given two lists,
xs
andys
, write a function which zipsxs
with the reverse ofys
(in one pass).
There’s a lovely paper (Danvy and Goldberg 2005) which goes though a number of tricks for how to do certain list manipulations “in reverse”. Their technique is known as “there and back again”. However, I’d like to describe a different way to get to the same technique, using folds.
Whenever I need to do some list manipulation in reverse (i.e., I need
the input list to be reversed), I first see if I can rewrite the
function as a fold, and then just switch out foldr
for
foldl
.
For our puzzle here, we need to first write zip
as a
fold:
zip :: [a] -> [b] -> [(a,b)]
zip = foldr f b
where
:ys) = (x,y) : k ys
f x k (y= []
f x k [] = [] b _
If that looks complex, or difficult to write, don’t worry! There’s a
systematic way to get to the above definition from the normal version of
zip
. First, let’s start with a normal zip
:
zip :: [a] -> [b] -> [(a,b)]
zip [] ys = []
zip xs [] = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys
Then, we need to turn it into a case-tree, where the first branch is on the list we want to fold over. In other words, we want the function to look like this:
zip xs = case xs of
???
To figure out the cases, we factor out the cases in the original
function. Since the second clause (zip xs [] = []
) is only
reachable when xs /= []
, it’s effectively a case for the
x:xs
branch.
zip :: [a] -> [b] -> [(a,b)]
zip xs = case xs of
-> \_ -> []
[] :xs -> \case
x-> []
[] :ys -> (x,y) : zip xs ys y
Now, we rewrite the different cases to be auxiliary functions:
zip :: [a] -> [b] -> [(a,b)]
zip xs = case xs of
-> b
[] :xs -> f x xs
xwhere
= \_ -> []
b = \x xs -> \case
f -> []
[] :ys -> (x,y) : zip xs ys y
And finally, we refactor the recursive call to the first case expression.
zip :: [a] -> [b] -> [(a,b)]
zip xs = case xs of
-> b
[] :xs -> f x (zip xs)
xwhere
= \_ -> []
b = \x xs -> \case
f -> []
[] :ys -> (x,y) : xs ys y
Then those two auxiliary functions are what you pass to
foldr
!
So, to reverse it, we simply take wherever we wrote
foldr f b
, and replace it with
foldl (flip f) b
:
zipRev :: [a] -> [b] -> [(a,b)]
= foldl (flip f) b
zipRev where
:ys) = (x,y) : k ys
f x k (y= []
f x k [] = [] b _
Of course, we’re reversing the wrong list here. Fixing that is simple:
zipRev :: [a] -> [b] -> [(a,b)]
= flip (foldl (flip f) b)
zipRev where
:xs) = (x,y) : k xs
f y k (x= []
f y k [] = [] b _
Maintaining Laziness
Rewrite the above function without using continuations.
zipRev
, as written above, actually uses
continuation-passing style. In most languages (including
standard ML, which was the one used in Danvy and Goldberg (2005)), this is
pretty much equivalent to a direct-style implementation (modulo some
performance weirdness). In a lazy language like Haskell, though,
continuation-passing style often makes things unnecessarily strict.
Consider the church-encoded pairs:
newtype Pair a b
= Pair
runPair :: forall c. (a -> b -> c) -> c
{
}
firstC :: (a -> a') -> Pair a b -> Pair a' b
= Pair (\k -> runPair p (k . f))
firstC f p
firstD :: (a -> a') -> (a, b) -> (a', b)
~(x,y) = (f x, y)
firstD f
fstD :: (a, b) -> a
~(x,y) = x
fstD
fstC :: Pair a b -> a
= runPair p const
fstC p
>>> fstC (firstC (const ()) undefined)
undefined
>>> fstD (firstD (const ()) undefined)
()
So it’s sometimes worth trying to avoid continuations if there is a fast direct-style solution. (alternatively, continuations can give you extra strictness when you do want it)
First, I’m going to write a different version of zipRev
,
which folds on the first list, not the second.
= foldl f (\_ r -> r) xs ys []
zipRev xs ys where
:ys) r = k ys ((x,y):r) f k x (y
Then, we inline the definition of foldl
:
= foldr f id xs (\_ r -> r) ys []
zipRev xs ys where
= k (\(y:ys) r -> c ys ((x,y):r)) f x k c
Then, as a hint, we tuple up the two accumulating parameters:
= foldr f id xs snd (ys,[])
zipRev xs ys where
= k (\((y:ys),r) -> c (ys,(x,y):r)) f x k c
What we can see here is that we have two continuations stacked on top of each other. When this happens, they can often “cancel out”, like so:
= snd (foldr f (ys,[]) xs)
zipRev xs ys where
:ys,r) = (ys,(x,y):r) f x (y
And we have our direct-style implementation!
Note 14/05/2019: the “cancel-out” explanation there is a little handwavy, as I’m sure you’ll notice. However, there are a number of excellent explanations on this stackoverflow thread which explain it much better than I ever could. Thanks to Anders Kaseorg, Will Ness, user11228628, and to Joseph Sible -Sible (2019) for asking the question.
Manual Fusion
Detect that a list is a palindrome, in one pass.
We now know a good way to split a list in two, and a good way to zip a list with its reverse. We can combine the two to get a program that checks if a list is a palindrome. Here’s a first attempt:
= all (uncurry (==)) (uncurry zipRev (splitHalf xs)) isPal xs
But this is doing three passes!
To get around it, we can manually do some fusion. Fusion is a technique where we can spot scenarios like the following:
foldr f b (x : y : [])
And translate them into a version without a list:
`f` (y `f` b) x
The trick is making sure that the consumer is written as a fold, and
then we just put its f
and b
in place of the
:
and []
in the producer.
So, when we inline the definition of splitHalf
into
zipRev
, we get the following:
zipRevHalf :: [a] -> [(a,a)]
= snd (go xs xs)
zipRevHalf xs where
:ys) (_:_:zs) = f y (go ys zs)
go (y:ys) [_] = (ys,[])
go (_= (ys,[])
go ys []
:ys,r) = (ys,(x,y):r)
f x (y
= all (uncurry (==)) (zipRevHalf xs) isPal xs
(adding a special case for odd-length lists)
Finally, the all (uncurry (==))
is implemented as a fold
also. So we can fuse it with the rest of the definitions:
isPal :: Eq a => [a] -> Bool
= snd (go xs xs)
isPal xs where
:ys) (_:_:zs) = f y (go ys zs)
go (y:ys) [_] = (ys,True)
go (_= (ys,True)
go ys []
:ys,r) = (ys,(x == y) && r) f x (y
You may have spotted the writer monad over All
there.
Indeed, we can rewrite it to use the monadic bind:
isPal :: Eq a => [a] -> Bool
= getAll (fst (go xs xs)) where
isPal xs :ys) (_:_:zs) = f y =<< go ys zs
go (y:ys) [_] = pure ys
go (_= pure ys
go ys []
:zs) = (All (y == z), zs) f y (z
Eliminating Multiple Passes with Laziness
Construct a Braun tree from a list in linear time.
This is also a very well-known trick (Bird 1984), but today I’m going to use it to write a function for constructing Braun trees.
A Braun tree is a peculiar structure. It’s a binary tree, where adjacent branches can differ in size by only 1. When used as an array, it has lookup times. It’s enumerated like so:
┌─7
┌3┤
│ └11
┌1┤
│ │ ┌─9
│ └5┤
│ └13
0┤
│ ┌─8
│ ┌4┤
│ │ └12
└2┤
│ ┌10
└6┤
└14
The objective is to construct a tree from a list in linear time, in the order defined above. Okasaki (1997) observed that, from the list:
0..14] [
Each level in the tree is constructed from chucks of powers of two. In other words:
0],[1,2],[3,4,5,6],[7,8,9,10,11,12,13,14]] [[
From this, we can write the following function:
= []
rows k [] = (k , take k xs) : rows (2*k) (drop k xs)
rows k xs
= zipWith3 Node xs ts1 ts2
build (k,xs) ts where
= splitAt k (ts ++ repeat Leaf)
(ts1,ts2)
= head . foldr build [Leaf] . rows 1 fromList
The first place we’ll look to eliminate a pass is the
build
function. It combines two rows by splitting the
second in half, and zipping it with the first.
>>> build (3, [x1,x2,x3]) [y1,y2,y3,y4,y5,y6]
[(x1,y1,y4),(x2,y2,y5),(x3,y3,y6)]
We don’t need to store the length of the first list, though, as we are only using it to split the second, and we can do that at the same time as the zipping.
zipUntil :: (a -> b -> c) -> [a] -> [b] -> ([c],[b])
= ([],ys)
zipUntil _ [] ys :xs) (y:ys) = first (f x y:) (zipUntil f xs ys)
zipUntil f (x
>>> zipUntil (,) [1,2] "abc"
1,'a'),(2,'b')],"c") ([(
Using this function in build
looks like the
following:
= zipWith ($) ys ts2
build (k,xs) ts where
= zipUntil Node xs (ts ++ repeat Leaf) (ys,ts2)
That top-level zipWith
is also unnecessary,
though. If we make the program circular, we can produce ts2
as we consume it, making the whole thing single-pass.
= ys
build xs ts where
= zip3Node xs (ts ++ repeat Leaf) ts2
(ys,ts2) :xs) (y:ys) ~(z:zs) = first (Node x y z:) (zip3Node xs ys zs)
zip3Node (x= ([], ys) zip3Node [] ys _
That zip3Node
is a good candidate for rewriting as a
fold, also, making the whole thing look like this:
= []
rows k [] = take k xs : rows (2*k) (drop k xs)
rows k xs
= ys
build xs ts where
= foldr f b xs ts zs
(ys,zs) :ys) ~(z:zs) = first (Node x y z:) (xs ys zs)
f x xs (y= ([],ys)
b ys _
= head . foldr build (repeat Leaf) . rows 1 fromList
To fuse all of those definitions, we first will need to rewrite
rows
as a fold:
= uncurry (:) (foldr f b xs 1 2)
rows xs where
= ([],[])
b _ _ 0 j = ([], uncurry (:) (f x k j (j*2)))
f x k = first (x:) (k (i-1) j) f x k i j
Once we have everything as a fold, the rest of the transformation is pretty mechanical. At the end of it all, we get the following linear-time function for constructing a Braun tree from a list:
fromList :: [a] -> Tree a
= head (l (foldr f b xs 1 2))
fromList xs where
= (repeat Leaf, (repeat Leaf, ys))
b _ _ ys zs
= let (xs, ys) = uncurry k ys in xs
l k
0 j ys zs = ([], (l (f x k j (j*2)), ys))
f x k ~(y:ys) ~(z:zs) = first (Node x y z:) (k (i-1) j ys zs) f x k i j