Folding Two Things at Once

Posted on April 17, 2016
Tags: Haskell

There’s a whole family of Haskell brainteasers surrounding one function: foldr. The general idea is to convert some function on lists which uses recursion into one that uses foldr. map, for instance:

map :: (a -> b) -> [a] -> [b]
map f = foldr (\e a -> f e : a) []

Some can get a little trickier. dropWhile, for instance. (See here and here for interesting articles on that one in particular.)

dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p = fst . foldr f ([],[]) where
  f e ~(xs,ys) = (if p e then xs else zs, zs) where zs = e : ys

Zip

One function which was a little harder to convert than it first seemed was zip.

Here’s the first (non) solution:

zip :: [a] -> [b] -> [(a,b)]
zip = foldr f (const []) where
  f x xs (y:ys) = (x,y) : xs ys
  f _ _  [] = []

The problem with the above isn’t that it doesn’t work: it does. The problem is that it’s not really using foldr. It’s only using it on the first list: there’s still a manual uncons being performed on the second. Ideally, I would want the function to look something like this:

zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr f (\_ _ -> []) xs (foldr g (const []) ys)

The best solution I found online only dealt with Folds, not Foldables. You can read it here.

Recursive Types

Reworking the solution online for Foldables, the initial intuition is to have the foldr on the ys produce a function which takes an element of the xs, and returns a function which takes an element of the xs, and so on, finally returning the created list. The problem with that approach is the types involved:

zip :: [a] -> [b] -> [(a,b)]
zip xs = foldr f (const []) xs . foldr g (\_ _ -> []) where
  g e2 r2 e1 r1 = (e1,e2) : (r1 r2)
  f e r x = x e r

You get the error:

Occurs check: cannot construct the infinite type: t0 ~ a -> (t0 -> [(a, b)]) -> [(a, b)].

Haskell’s typechecker doesn’t allow for infinitely recursive types.

You’ll be familiar with this problem if you’ve ever tried to encode the Y-combinator, or if you’ve fiddled around with the recursion-schemes package. You might also be familiar with the solution: a newtype, encapsulating the recursion. In this case, the newtype looks very similar to the signature for foldr:

newtype RecFold a b = 
  RecFold { runRecFold :: a -> (RecFold a b -> b) -> b }

Now you can insert and remove the RecFold wrapper, helping the typechecker to understand the recursive types as it goes:

zip :: [a] -> [b] -> [(a,b)]
zip xs =
  foldr f (const []) xs . RecFold . foldr g (\_ _ -> []) where
    g e2 r2 e1 r1 = (e1,e2) : (r1 (RecFold r2))
    f e r x = runRecFold x e r

As an aside, the performance characteristics of the newtype wrapper are totally opaque to me. There may be significant improvements by using coerce from Data.Coerce, but I haven’t looked into it.

Generalised Zips

The immediate temptation from the function above is to generalise it. First to zipWith, obviously:

zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith c xs =
  foldr f (const []) xs . RecFold . foldr g (\_ _ -> []) where
    g e2 r2 e1 r1 = c e1 e2 : (r1 (RecFold r2))
    f e r x = runRecFold x e r

What’s maybe a little more interesting, though, would be a foldr on two lists. Something which folds through both at once, using a supplied combining function:

foldr2 :: (Foldable f, Foldable g)
       => (a -> b -> c -> c)
       -> c -> f a -> g b -> c
foldr2 c i xs =
  foldr f (const i) xs . RecFold . foldr g (\_ _ -> i) where
    g e2 r2 e1 r1 = c e1 e2 (r1 (RecFold r2))
    f e r x = runRecFold x e r

Of course, once you can do two, you can do three:

foldr3 :: (Foldable f, Foldable g, Foldable h)
       => (a -> b -> c -> d -> d)
       -> d -> f a -> g b -> h c -> d
foldr3 c i xs ys =
  foldr f (const i) xs . RecFold . foldr2 g (\_ _ -> i) ys where
    g e2 e3 r2 e1 r1 = c e1 e2 e3 (r1 (RecFold r2))
    f e r x = runRecFold x e r

And so on.

There’s the added benefit that the above functions work on much more than just lists.

Catamorphisms

Getting a little formal about the above functions, a fold can be described as a catamorphism. This is a name for a pattern of breaking down some recursive structure. There’s a bunch of them in the recursion-schemes package. The question is, then: can you express the above as a kind of catamorphism? Initially, using the same techniques as before, you can:

newtype RecF f a = RecF { unRecF :: Base f (RecF f a -> a) -> a }

zipo :: (Functor.Foldable f, Functor.Foldable g)
     => (Base f (RecF g c) -> Base g (RecF g c -> c) -> c)
     -> f -> g -> c
zipo alg xs ys = cata (flip unRecF) ys (cata (RecF . alg) xs)

Then, coming full circle, you get a quite nice encoding of zip:

zip :: [a] -> [b] -> [(a,b)]
zip = zipo alg where
  alg Nil _ = []
  alg _ Nil = []
  alg (Cons x xs) (Cons y ys) = (x, y) : ys xs

However, the RecF is a little ugly. In fact, it’s possible to write the above without any recursive types, using the RankNTypes extension. (It’s possible that you could do the same with foldr2 as well, but I haven’t figured it out yet)

You can actually use a newtype that’s provided by the recursion-schemes library as-is. It’s Mu. This is required for an encoding of the Y-combinator. It’s usually presented in this form:

newtype Mu a = Roll { unroll :: Mu a -> a }

However, in the recursion-schemes package, its definition looks like this:

newtype Mu f = Mu (forall a. (f a -> a) -> a)

No recursion! The zipo combinator above can be written using Mu like so:

zipo :: (Functor.Foldable f, Functor.Foldable g)
     => (Base f (Mu (Base g) -> c) -> Base g (Mu (Base g)) -> c)
     -> f -> g -> c
zipo alg xs = cata (\x -> alg x . project) xs . refix

And the new version of zip has a slightly more natural order of arguments:

zip :: [a] -> [b] -> [(a,b)]
zip = zipo alg where
  alg Nil _ = []
  alg _ Nil = []
  alg (Cons x xs) (Cons y ys) = (x,y) : xs ys

Zipping Into

There’s one more issue, though, that’s slightly tangential. A lot of the time, the attraction of rewriting functions using folds and catamorphisms is that the function becomes more general: it no longer is restricted to lists. For zip, however, there’s still a pesky list left in the signature:

zip :: (Foldable f, Foldable g) => f a -> g b -> [(a,b)]

It would be a little nicer to be able to zip through something preserving the structure of one of the things being zipped through. For no reason in particular, let’s assume we’ll preserve the structure of the first argument. The function will have to account for the second argument running out before the first, though. A Maybe can account for that:

zipInto :: (Foldable f, Foldable g) 
        => (a -> Maybe b -> c) 
        -> f a -> g b -> f c

If the second argument runs out, Nothing will be passed to the combining function.

It’s clear that this isn’t a fold over the first argument, it’s a traversal. A first go at the function uses the state monad, but restricts the second argument to a list:

zipInto :: Traversable f => (a -> Maybe b -> c) -> f a -> [b] -> f c
zipInto c xs ys = evalState (traverse f xs) ys where
  f x = do
    h <- gets uncons
    case h of 
      Just (y,t) -> do 
        put t
        pure (c x (Just y))
      Nothing -> pure (c x Nothing)

That code can be cleaned up a little:

zipInto :: Traversable f => (a -> Maybe b -> c) -> f a -> [b] -> f c 
zipInto c = evalState . traverse (state . f . c) where
  f x [] = (x Nothing, [])
  f x (y:ys) = (x (Just y), ys)

But really, the uncons needs to go. Another newtype wrapper is needed, and here’s the end result:

newtype RecAccu a b =
  RecAccu { runRecAccu :: a -> (RecAccu a b, b) }
  
zipInto :: (Traversable t, Foldable f)
        => (a -> Maybe b -> c) -> t a -> f b -> t c
zipInto f xs =
  snd . flip (mapAccumL runRecAccu) xs . RecAccu . foldr h i where
    i e = (RecAccu i, f e Nothing)
    h e2 a e1 = (RecAccu a, f e1 (Just e2))