Folding Two Things at Once
There’s a whole family of Haskell brainteasers surrounding one
function: foldr
. The
general idea is to convert some function on lists which uses recursion
into one that uses foldr
. map
, for
instance:
map :: (a -> b) -> [a] -> [b]
map f = foldr (\e a -> f e : a) []
Some can get a little trickier. dropWhile
, for
instance. (See here
and here for
interesting articles on that one in particular.)
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p = fst . foldr f ([],[]) where
~(xs,ys) = (if p e then xs else zs, zs) where zs = e : ys f e
Zip
One function which was a little harder to convert than it first
seemed was zip
.
Here’s the first (non) solution:
zip :: [a] -> [b] -> [(a,b)]
zip = foldr f (const []) where
:ys) = (x,y) : xs ys
f x xs (y= [] f _ _ []
The problem with the above isn’t that it doesn’t work: it does. The
problem is that it’s not really using foldr
. It’s
only using it on the first list: there’s still a manual uncons being
performed on the second. Ideally, I would want the function to look
something like this:
zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr f (\_ _ -> []) xs (foldr g (const []) ys)
The best solution I found online only dealt with Fold
s, not
Foldable
s. You
can read it here.
Recursive Types
Reworking the solution online for Foldable
s, the
initial intuition is to have the foldr
on the
ys
produce a function which
takes an element of the xs
, and
returns a function which takes an element of the xs
, and so on, finally returning the
created list. The problem with that approach is the types
involved:
zip :: [a] -> [b] -> [(a,b)]
zip xs = foldr f (const []) xs . foldr g (\_ _ -> []) where
= (e1,e2) : (r1 r2)
g e2 r2 e1 r1 = x e r f e r x
You get the error:
Occurs check: cannot construct the infinite type: t0 ~ a -> (t0 -> [(a, b)]) -> [(a, b)]
.
Haskell’s typechecker doesn’t allow for infinitely recursive types.
You’ll be familiar with this problem if you’ve ever tried to encode
the Y-combinator, or if you’ve fiddled around with the recursion-schemes
package. You might also be familiar with the solution: a newtype
,
encapsulating the recursion. In this case, the newtype
looks
very similar to the signature for foldr
:
newtype RecFold a b =
RecFold { runRecFold :: a -> (RecFold a b -> b) -> b }
Now you can insert and remove the RecFold
wrapper, helping the typechecker to understand the recursive types as it
goes:
zip :: [a] -> [b] -> [(a,b)]
zip xs =
foldr f (const []) xs . RecFold . foldr g (\_ _ -> []) where
= (e1,e2) : (r1 (RecFold r2))
g e2 r2 e1 r1 = runRecFold x e r f e r x
As an aside, the performance characteristics of the newtype
wrapper are totally opaque to me. There may be significant improvements
by using coerce
from Data.Coerce,
but I haven’t looked into it.
Generalised Zips
The immediate temptation from the function above is to generalise it.
First to zipWith
,
obviously:
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith c xs =
foldr f (const []) xs . RecFold . foldr g (\_ _ -> []) where
= c e1 e2 : (r1 (RecFold r2))
g e2 r2 e1 r1 = runRecFold x e r f e r x
What’s maybe a little more interesting, though, would be a foldr
on two
lists. Something which folds through both at once, using a supplied
combining function:
foldr2 :: (Foldable f, Foldable g)
=> (a -> b -> c -> c)
-> c -> f a -> g b -> c
=
foldr2 c i xs foldr f (const i) xs . RecFold . foldr g (\_ _ -> i) where
= c e1 e2 (r1 (RecFold r2))
g e2 r2 e1 r1 = runRecFold x e r f e r x
Of course, once you can do two, you can do three:
foldr3 :: (Foldable f, Foldable g, Foldable h)
=> (a -> b -> c -> d -> d)
-> d -> f a -> g b -> h c -> d
=
foldr3 c i xs ys foldr f (const i) xs . RecFold . foldr2 g (\_ _ -> i) ys where
= c e1 e2 e3 (r1 (RecFold r2))
g e2 e3 r2 e1 r1 = runRecFold x e r f e r x
And so on.
There’s the added benefit that the above functions work on much more than just lists.
Catamorphisms
Getting a little formal about the above functions, a fold
can be described as a
catamorphism. This is a name for a pattern of breaking down some
recursive structure. There’s a bunch of them in the recursion-schemes
package. The question is, then: can you express the above as a kind of
catamorphism? Initially, using the same techniques as before, you
can:
newtype RecF f a = RecF { unRecF :: Base f (RecF f a -> a) -> a }
zipo :: (Functor.Foldable f, Functor.Foldable g)
=> (Base f (RecF g c) -> Base g (RecF g c -> c) -> c)
-> f -> g -> c
= cata (flip unRecF) ys (cata (RecF . alg) xs) zipo alg xs ys
Then, coming full circle, you get a quite nice encoding of zip
:
zip :: [a] -> [b] -> [(a,b)]
zip = zipo alg where
Nil _ = []
alg Nil = []
alg _ Cons x xs) (Cons y ys) = (x, y) : ys xs alg (
However, the RecF
is a
little ugly. In fact, it’s possible to write the above without any
recursive types. (It’s possible that you could do the same with foldr2
as well, but I haven’t figured
it out yet)
zipo :: (Functor.Foldable f, Functor.Foldable g)
=> (Base f (g -> c) -> Base g g -> c)
-> f -> g -> c
= cata (\x -> alg x . project) zipo alg
And the new version of zip
has a
slightly more natural order of arguments:
zip :: [a] -> [b] -> [(a,b)]
zip = zipo alg where
Nil _ = []
alg Nil = []
alg _ Cons x xs) (Cons y ys) = (x,y) : xs ys alg (
Zipping Into
There’s one more issue, though, that’s slightly tangential. A lot of
the time, the attraction of rewriting functions using folds and
catamorphisms is that the function becomes more general: it no longer is
restricted to lists. For zip
, however,
there’s still a pesky list left in the signature:
zip :: (Foldable f, Foldable g) => f a -> g b -> [(a,b)]
It would be a little nicer to be able to zip through something
preserving the structure of one of the things being zipped
through. For no reason in particular, let’s assume we’ll preserve the
structure of the first argument. The function will have to account for
the second argument running out before the first, though. A Maybe
can
account for that:
zipInto :: (Foldable f, Foldable g)
=> (a -> Maybe b -> c)
-> f a -> g b -> f c
If the second argument runs out, Nothing
will
be passed to the combining function.
It’s clear that this isn’t a fold over the first argument, it’s a traversal. A first go at the function uses the state monad, but restricts the second argument to a list:
zipInto :: Traversable f => (a -> Maybe b -> c) -> f a -> [b] -> f c
= evalState (traverse f xs) ys where
zipInto c xs ys = do
f x <- gets uncons
h case h of
Just (y,t) -> do
put tpure (c x (Just y))
Nothing -> pure (c x Nothing)
That code can be cleaned up a little:
zipInto :: Traversable f => (a -> Maybe b -> c) -> f a -> [b] -> f c
= evalState . traverse (state . f . c) where
zipInto c = (x Nothing, [])
f x [] :ys) = (x (Just y), ys) f x (y
But really, the uncons needs to go. Another newtype
wrapper is needed, and here’s the end result:
newtype RecAccu a b =
RecAccu { runRecAccu :: a -> (RecAccu a b, b) }
zipInto :: (Traversable t, Foldable f)
=> (a -> Maybe b -> c) -> t a -> f b -> t c
=
zipInto f xs snd . flip (mapAccumL runRecAccu) xs . RecAccu . foldr h i where
= (RecAccu i, f e Nothing)
i e = (RecAccu a, f e1 (Just e2)) h e2 a e1