Rose Trees, Breadth-First
In contrast to the more common binary trees, in a rose tree every node can have any number of children.
data Tree a
= Node
root :: a
{ forest :: Forest a
,
}
type Forest a = [Tree a]
One of the important manipulations of this data structure, which forms the basis for several other algorithms, is a breadth-first traversal. I’d like to go through a couple of techniques for implementing it, and how more generally you can often get away with using much simpler data structures if you really pinpoint the API you need from them.
As a general technique, Okasaki (2000) advises that a queue be used:
breadthFirst :: Tree a -> [a]
= go (singleton tr)
breadthFirst tr where
= case pop q of
go q Nothing -> []
Just (Node x xs,qs) -> x : go (qs `append` xs)
There are three functions left undefined there: singleton
, pop
, and append
. They represent the API of our
as-of-yet unimplemented queue, and their complexity will dictate the
complexity of the overall algorithm. As a (bad) first choice, we could
use simple lists, with the functions defined thus:
= [x]
singleton x :xs) = Just (x,xs)
pop (x= Nothing
pop [] = (++) append
Those repeated appends are bad news. The queue needs to be able to support popping from one side and appending from the other, which is something lists absolutely cannot do well.
We could swap in a more general queue implementation, possibly using Data.Sequence, or a pair of lists. But these are more complex and general than we need, so let’s try and pare down the requirements a little more.
First, we don’t need a pop: the go function can be expressed as a fold instead. Second, we don’t need every append to be immediately stuck into the queue, we can batch them, first appending to a structure that’s efficient for appends, and then converting that to a structure which is efficient for folds. In code:
breadthFirst :: Forest a -> [a]
= foldr f b ts []
breadthFirst ts where
Node x xs) fw bw = x : fw (xs : bw)
f (
= []
b [] = foldl (foldr f) b qs [] b qs
We’re consing instead of appending, but the consumption is being done
in the correct direction anyway, because of the foldl
.
Levels
So next step: to get the levels
function from Data.Tree.
Instead of doing a breadth-first traversal, it returns the nodes at each
level of the tree. Conceptually, every time we did the reverse
above (called foldl
), we
will do a cons as well:
levels :: Forest a -> [[a]]
= foldl f b ts [] []
levels ts where
Node x xs) ls qs = k (x : ls) (xs : qs)
f k (
= []
b _ [] = k : foldl (foldl f) b qs [] [] b k qs
Unfolding
The original reason I started work on these problems was this issue
in containers. It concerns the unfoldTreeM_BF
function. An early go at rewriting it, inspired by levels above, looks
like this:
unfoldForestM_BF :: Monad m => (b -> m (a, [b])) -> [b] -> m (Forest a)
unfoldForestM_BF f ts = b [ts] (const id)
where
b [] k = pure (k [] [])
b qs k = foldl (foldr t) b qs [] (\x -> k [] . foldr (uncurry run) id x)
t a fw bw k = do
(x,cs) <- f a
let !n = length cs
fw (cs : bw) (k . (:) (x, n))
run x n xs ys =
case splitAt n ys of
(cs,zs) -> Node x cs : xs zs
It basically performs the same this as the levels function, but
builds the tree back up in the end using the run
function. In order to do that, we
store the length of each subforest on line 9, so that each node knows
how much to take from each level.
A possible optimization is to stop taking the length. Anything in
list processing that takes a length screams “wrong” to me (although it’s
not always true!) so I often try to find a way to avoid it. The first
option would be to keep the cs
on line 8 around, and use it as an indicator for the length.
That keeps it around longer than strictly necessary, though. The other
option is to add a third level: for breadthFirst
above, we had one level;
for levels
, we added another, to
indicate the structure of the nodes and their subtrees; here, we can add
a third, to maintain that structure when building back up:
unfoldForestM_BF :: Monad m => (b -> m (a, [b])) -> [b] -> m (Forest a)
= b [ts] (\ls -> concat . ls)
unfoldForestM_BF f ts where
= pure (k id [])
b [] k = foldl g b qs [] (\ls -> k id . ls)
b qs k
= foldr t (\ls ys -> a ys (k . run ls)) xs [] qs
g a xs qs k
= f a >>= \(x,cs) -> fw (x:xs) (cs:bw)
t a fw xs bw
= uncurry (:) . foldl go ((,) [] . xs) x
run x xs where
:zs) = (Node y z : ys', zs')
go ys y (zwhere
= ys zs (ys',zs')
This unfortunately slows down the code.