Some More List Algorithms
It’s been a while since I last wrote a post (I’ve been busy with my Master’s thesis, which is nearly done), so I thought I would quickly throw out some fun snippets of Haskell I had reason to write over the past couple of weeks.
Zipping With Folds
For some reason, until recently I had been under the impression that
it was impossible to fuse zips efficiently. In other words, I thought
that zip was like tail, in that if it was
implemented using only foldr it would result in an
asymptotic slowdown (tail is normally
,
implemented as a fold it’s
).
Well, it seems like this is not the case. The old zip-folding code I had looks to me now to be the correct complexity: it’s related to How To Zip Folds, by Oleg Kiselyov (although I’m using a different version of the function which can be found on the mailing list). The relevant code is as follows:
newtype Zip a b =
Zip { runZip :: a -> (Zip a b -> b) -> b }
zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr xf xb xs (Zip (foldr yf yb ys))
where
xf x xk yk = runZip yk x xk
xb _ = []
yf y yk x xk = (x,y) : xk (Zip yk)
yb _ _ = []There are apparently reasons
for why the Prelude’s zip isn’t allowed to fuse both of its
arguments: I don’t fully understand them, however. (in particular the
linked page says that the fused zip would have different strictness
behaviour, but the version I have above seems to function properly).
This version of zip leads to some more fun solutions to folding puzzles, like this one:
Write a function that is equivalent to:
zipFromEnd xs ys = reverse (zip (reverse xs) (reverse ys))Without creating any intermediate lists.
The desired function is interesting in that, instead of lining up lists according to their first elements, it aligns them according to the ends.
>>> zipFromEnd [1,2,3] "abc"
[(1,'a'),(2,'b'),(3,'c')]
>>> zipFromEnd [1,2,3] "abcd"
[(1,'b'),(2,'c'),(3,'d')]
>>> zipFromEnd [1,2,3,4] "abc"
[(2,'a'),(3,'b'),(4,'c')]The solution here is just to use foldl, and we get the
following:
zipFromEnd :: [a] -> [b] -> [(a,b)]
zipFromEnd xs ys = foldl xf xb xs (Zip (foldl yf yb ys)) []
where
xf xk x yk = runZip yk x xk
xb _ zs = zs
yf yk y x xk zs = xk (Zip yk) ((x,y) : zs)
yb _ _ zs = zsAnother function which is a little interesting is the “zip longest” function:
zipLongest :: (a -> a -> a) -> [a] -> [a] -> [a]
zipLongest c xs ys = foldr xf xb xs (Zip (foldr yf yb ys))
where
xf x xk yk = runZip yk (Just x) xk
xb zs = runZip zs Nothing xb
yf y yk Nothing xk = y : xk (Zip yk)
yf y yk (Just x) xk = c x y : xk (Zip yk)
yb Nothing _ = []
yb (Just x) zs = x : zs (Zip yb)Finally, all of these functions rely on the Zip type,
which is not strictly positive. This means that we can’t use it
in Agda, and it’s tricky to reason about: I wonder what it is about
functions for deforestation that tends to lead to non-strictly-positive
datatypes.
Lexicographic Permutations
The next puzzle I was interested in was finding the next lexicographic permutation of some string. In other words, given some string , you need to find another string that is a permutation of such that , and that there is no string that is a permutation of and . The Wikipedia article on the topic is excellent (and clear), but again the algorithm is described in extremely imperative terms:
- Find the largest index k such that a[k] < a[k + 1]. If no such index exists, the permutation is the last permutation.
- Find the largest index l greater than k such that a[k] < a[l].
- Swap the value of a[k] with that of a[l].
- Reverse the sequence from a[k + 1] up to and including the final element a[n].
The challenge here is to write this algorithm without doing any indexing: indexing is expensive on Haskell lists, and regardless it is cleaner to express it without.
I managed to work out the following:
nextLexPerm :: Ord a => [a] -> Maybe [a]
nextLexPerm [] = Nothing
nextLexPerm (x:xs) = go1 x xs
where
go1 _ [] = Nothing
go1 i (j:xs) = maybe (go2 i j [] xs) (Just . (i:)) (go1 j xs)
go2 i j xs ys
| j <= i = Nothing
| otherwise = Just (fromMaybe (j : foldl (flip (:)) (i:xs) ys) (go3 i (j:xs) ys))
go3 _ _ [] = Nothing
go3 i xs (j:ys) = go2 i j xs ysCircular Sorting
This comes from the Rosetta Code problem Circle Sort. This is a strange little sorting algorithm, where basically you compare elements on opposite sides of an array, swapping them as needed. The example given is the following:
6 7 8 9 2 5 3 4 1
First we compare (and swap) 6 and 1, and
then 7 and 4, and so on, until we reach the
middle. At this point we split the array in two and perform the
procedure on each half. After doing this once it is not the case that
the array is definitely sorted: you may have to repeat the procedure
several (but finitely many) times, until no swaps are performed.
I have absolutely no idea what the practical application for such an odd algorithm would be, but it seemed like an interesting challenge to try implement it in a functional style (i.e. without indices or mutation).
The first thing we have to do is fold the list in half, so we pair up the right items. We’ve actually seen an algorithm to do this before: it’s often called the “tortoise and the hare”, and our previous use was to check if a list was a palindrome. Here’s how we implement it:
halve :: [a] -> [(a,a)]
halve xs = snd (go xs xs)
where
go (y:ys) (_:_:zs) = f y (go ys zs)
go (_:ys) [_] = (ys,[])
go ys [] = (ys,[])
f x (y:ys,zs) = (ys, (x,y) : zs)
>>> halve [6,7,8,9,2,5,3,4,1]
[(6,1),(7,4),(8,3),(9,5)]Notice that the 2 in the very middle of the list is
missing from the output: I’ll describe how to handle that element later
on. In the above piece of code, that 2 actually gets bound
to the underscore (in (_:ys)) in the second clause of
go.
Next we need to do the actual swapping: this is actually pretty straightforward, if we think of the algorithm functionally, rather than imperatively. Instead of swapping things in place, we are building up both halves of the new list, so the “swap” operation should simply decide which list each item goes into.
halve :: Ord a => [a] -> ([a],[a])
halve xs = tl (go xs xs)
where
tl (_,lte,gt) = (lte,gt)
go (y:ys) (_:_:zs) = swap y (go ys zs)
go (_:ys) [_] = (ys,[],[])
go ys [] = (ys,[],[])
swap x (y:ys,lte,gt)
| x <= y = (ys, x : lte, y : gt)
| otherwise = (ys, y : lte, x : gt)At this point we can also see what to do with the middle item: we’ll put it in the higher or lower list, depending on a comparison with the element it’s next to.
halve :: Ord a => [a] -> ([a],[a])
halve xs = tl (go xs xs)
where
tl (_,lte,gt) = (lte,gt)
go (y:ys) (_:_:zs) = swap y (go ys zs)
go ys [] = (ys,[],[])
go (y:ys) [_] = (ys,[y | e],[y | not e])
where e = y <= head ys
swap x (y:ys,lte,gt)
| x <= y = (ys, x : lte, y : gt)
| otherwise = (ys, y : lte, x : gt)Next, we can use this as a helper function in the overall recursive function.
circleSort :: Ord a => [a] -> [a]
circleSort [] = []
circleSort [x] = [x]
circleSort xs =
let (lte,gt) = halve xs
in circleSort lte ++ circleSort (reverse gt)This function isn’t correct (yet). As we mentioned already, we need to run the circle sort procedure multiple times until no swaps occur. We can add in the tracking of swaps like so:
circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
where
(swapped,ks) = go xs
go [] = (False, [])
go [x] = (False, [x])
go xs =
let (s,_,lte,gt) = halve xs xs
(sl,lte') = go lte
(sg,gt' ) = go (reverse gt)
in (s || sl || sg, lte' ++ gt')
halve (y:ys) (_:_:zs) = swap y (halve ys zs)
halve ys [] = (False,ys,[],[])
halve (y:ys) [_] = (False,ys,[y | e],[y | not e])
where e = y <= head ys
swap x (s,y:ys,lte,gt)
| x <= y = (s ,ys, x : lte, y : gt)
| otherwise = (True,ys, y : lte, x : gt)So at this point we actually have a working implementation of the
function, which avoids indices as intended. It has some problems still,
though. First, we call ++, when we could be using
difference lists. Here’s the solution to that:
circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
where
(swapped,ks) = go xs []
go [] zs = (False, zs)
go [x] zs = (False, x:zs)
go xs zs =
let (s,_,lte,gt) = halve xs xs
(sl,lte') = go lte gt'
(sg,gt' ) = go (reverse gt) zs
in (s || sl || sg, lte')
halve (y:ys) (_:_:zs) = swap y (halve ys zs)
halve ys [] = (False,ys,[],[])
halve (y:ys) [_] = (False,ys,[y | e],[y | not e])
where e = y <= head ys
swap x (s,y:ys,lte,gt)
| x <= y = (s ,ys, x : lte, y : gt)
| otherwise = (True,ys, y : lte, x : gt)Next we can actually rewrite the go function to allow
for a certain amount of tail recursion (kind of):
circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
where
(swapped,ks) = go xs (False,[])
go [] (s,ks) = (s,ks)
go [x] (s,ks) = (s,x:ks)
go xs (s,ks) =
let (s',_,ls,rs) = halve s xs xs
in go ls (go (reverse rs) (s',ks))
halve s (y:ys) (_:_:zs) = swap y (halve s ys zs)
halve s ys [] = (s,ys,[],[])
halve s (y:ys) [_] = (s,ys,[y | e],[y | not e])
where e = y <= head ys
swap x (s,y:ys,ls,rs)
| x <= y = ( s,ys,x:ls,y:rs)
| otherwise = (True,ys,y:ls,x:rs)Next, we call reverse: but we can avoid the reverse by
passing a parameter which tells us which direction we’re walking down
the list. Since the swapping logic is symmetric, we’re able to just
invert some of the functions. It is a little tricky, though:
circleSort :: Ord a => [a] -> [a]
circleSort xs = if swapped then circleSort ks else ks
where
(swapped,ks) = go False xs (False,[])
go d [] (s,ks) = (s,ks)
go d [x] (s,ks) = (s,x:ks)
go d xs (s,ks) =
let (s',_,ls,rs) = halve d s xs xs
in go False ls (go True rs (s',ks))
halve d s (y:ys) (_:_:zs) = swap d y (halve d s ys zs)
halve d s ys [] = (s,ys,[],[])
halve d s (y:ys) [_] = (s,ys,[y | e],[y | not e])
where e = y <= head ys
swap d x (s,y:ys,ls,rs)
| bool (<=) (<) d x y = ( d || s,ys,x:ls,y:rs)
| otherwise = (not d || s,ys,y:ls,x:rs)So there it is! The one-pass, purely function implementation of circle sort. Very possibly the most useless piece of code I’ve ever written.