Convolutions and Semirings
I have been working a little more on my semirings
library recently, and I have come across some interesting functions
in the process. First, a quick recap on the Semiring
class
and some related functions:
class Semiring a where
one :: a
zero :: a
infixl 6 <+>
(<+>) :: a -> a -> a
infixl 7 <.>
(<.>) :: a -> a -> a
add :: (Foldable f, Semiring a) => f a -> a
= foldl' (<+>) zero
add
mul :: (Foldable f, Semiring a) => f a -> a
= foldl' (<.>) one
mul
instance Semiring Integer where
= 1
one = 0
zero <+>) = (+)
(<.>) = (*)
(
instance Semiring Bool where
= True
one = False
zero <+>) = (||)
(<.>) = (&&) (
You can think of it as a replacement for Num
, but it
turns out to be much more generally useful than that.
Matrix Multiplication
The first interesting function is to do with matrix multiplication. Here’s the code for multiplying two matrices represented as nested lists:
mulMatrix :: Semiring a => [[a]] -> [[a]] -> [[a]]
= map (\row -> map (add . zipWith (<.>) row) cs) xs
mulMatrix xs ys where
= transpose ys cs
One of the issues with this code (other than its woeful performance)
is that it seems needlessly list-specific. zipWith
seems
like the kind of thing that exists on a bunch of different structures.
Indeed, the ZipList
wrapper uses zipWith
as its
<*>
implementation. Let’s try that for now:
mulMatrix :: (Semiring a, Applicative f) => f (f a) -> f (f a) -> f (f a)
= fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
mulMatrix xs ys where
= transpose ys cs
Of course, now add
needs to
work on our f
, so it should be
Foldable
mulMatrix :: (Semiring a, Applicative f, Foldable f)
=> f (f a) -> f (f a) -> f (f a)
= ... mulMatrix
transpose
is the missing
piece now. A little bit of Applicative
magic can help us out again, though: sequenceA
is
transpose
on ZipList
s (McBride and
Paterson 2008).
mulMatrix :: (Semiring a, Applicative f, Traversable f)
=> f (f a) -> f (f a) -> f (f a)
=
mulMatrix xs ys fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
where
= sequenceA ys cs
One further generalization: The two f
s don’t actually need to be the
same:
mulMatrix :: (Applicative n
Traversable m
,Applicative m
,Applicative p
,Semiring a)
,=> n (m a) -> m (p a) -> n (p a)
= fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
mulMatrix xs ys where
= sequenceA ys cs
Happily, the way that the wrappers (n
, m
, and p
) match up coincides precisely with
how matrix dimensions match up in matrix multiplication. Quoting from
the Wikipedia
definition:
if is an matrix and is an matrix, their matrix product is an matrix
This function is present in the linear
package with some different constraints. In fairness, Applicative
probably isn’t the best thing to use here since it doesn’t work for so
many instances (MonadZip
or something similar may be more suitable), but it’s very handy to have,
and works out-of the box for types like:
data Three a
= Three a a a
deriving (Functor, Foldable, Traversable, Eq, Ord, Show)
instance Applicative Three where
pure x = Three x x x
Three fx fy fz <*> Three xx xy xz = Three (fx xx) (fy xy) (fz xz)
Which makes it (to my mind) useful enough to keep. Also, it hugely simplified the code for matrix multiplication in square matrices I had, from Okasaki (1999).
Convolutions
If you’re putting a general class in a library that you want people
to use, and there exist sensible instances for common Haskell types, you
should probably provide those instances in the library to avoid orphans.
The meaning of “sensible” here is vague: generally speaking, if there is
only one obvious or clear instance, then it’s sensible. For a list
instance for the semiring class, for instance, I could figure out
several law-abiding definitions for <+>
,
one
and zero
, but only one for <.>
: polynomial multiplication. You know,
where you multiply two polynomials like so:
A more general definition looks something like this:
Or, fully generalized:
So it turns out that you can represent polynomials pretty elegantly as lists. Take an example from above:
And rearrange it in order of the powers of :
And fill in missing coefficients:
And then the list representation of that polynomial is the list of those coefficients:
3, 2, 0, 1] [
For me, the definitions of multiplication above were pretty hard to understand. In Haskell, however, the definition is quite beautiful:
instance Semiring a => Semiring [a] where
= [one]
one = []
zero <+> ys = ys
[] <+> [] = xs
xs :xs) <+> (y:ys) = x <+> y : (xs <+> ys)
(x<.> [] = []
_ <.> _ = []
[] :xs) <.> (y:ys) = (x<.>y) : map (x<.>) ys <+> xs <.> (y:ys) (x
This definition for <.>
can
be found on page 4 of McIlroy (1999). Although there
was a version of the paper with a slightly different definition:
<.> [] = []
_ <.> _ = []
[] :xs) <.> (y:ys)
(x= (x<.>y) : (map (x<.>) ys <+> map (<.>y) xs <+> (zero : (xs <.> ys)))
Similar to one which appeared in Dolan (2013).
As it happens, I prefer the first definition. It’s shorter, and I figured out how to write it as a fold:
<.> [] = []
_ <.> ys = foldr f [] xs where
xs = map (x <.>) ys <+> (zero : zs) f x zs
And if you inline the <+>
, you
get a reasonable speedup:
<.> ys = foldr f [] xs
xs where
= foldr (g x) id ys (zero : zs)
f x zs :zs) = x <.> y <+> z : a zs
g x y a (z= x <.> y : a [] g x y a []
The definition of <+>
can
also use a fold on either side for fusion purposes:
<+>) = foldr f id where
(:ys) = x <+> y : xs ys
f x xs (y= x : xs []
f x xs []
<+>) = flip (foldr f id) where
(:xs) = x <+> y : ys xs
f y ys (x= y : ys [] f y ys []
There are rules in the library to choose one of the above definitions if fusion is available.
This definition is much more widely useful than it may seem at first. Say, for instance, you wanted to search through pairs of things from two infinite lists. You can’t use the normal way to pair things for lists, the Cartesian product, because it will diverge:
| x <- [1..], y <- [1..]]
[(x,y) -- [(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10)...
You’ll never get beyond 1 in the first list. Zipping isn’t an option either, because you won’t really explore the search space, only corresponding pairs. Brent Yorgey showed that if you want a list like this:
-y) | x <- [0..], y <- [0..x] ]
[(y,x-- [(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)...
Then what you’re looking for is a convolution (the same thing as
polynomial multiplication). <.>
above can be adapted readily:
convolve :: [a] -> [b] -> [[(a,b)]]
= foldr f [] xs
convolve xs ys where
= foldr (g x) id ys ([] : zs)
f x zs :zs) = ((x, y) : z) : a zs
g x y a (z= [(x, y)] : a [] g x y a []
Flatten out this result to get your ordering. This convolution is a
little different from the one in the blog post. By inlining <+>
we
can avoid the expensive ++
function,
without using difference lists.
Long Multiplication
Here’s another cool use of lists as polynomials: they can be used as a positional numeral system. Most common numeral systems are positional, including Arabic (the system you most likely use, where twenty-four is written as 24) and binary. Non-positional systems are things like Roman numerals. Looking at the Arabic system for now, we see that the way of writing down numbers:
Can be thought of the sum of each digit multiplied by ten to the power of its position:
Where the positions are numbered from the right. In other words, it’s our polynomial list from above in reverse. As well as that, the convolution is long multiplication.
Now, taking this straight off we can try some examples:
-- 12 + 15 = 27
2, 1] <+> [5, 1] == [7, 2]
[
-- 23 * 2 = 46
3, 2] <.> [2] == [6, 4] [
The issue, of course, is that we’re not handling carrying properly:
6] <+> [6] == [12] [
No matter: we can perform all the carries after the addition, and everything works out fine:
carry :: Integral a
=> a -> [a] -> [a]
= foldr f (toBase base) xs 0
carry base xs where
= r : a q where
f e a cin = quotRem (cin + e) base
(q,r)
toBase :: Integral a => a -> a -> [a]
= unfoldr f where
toBase base 0 = Nothing
f = Just (swap (quotRem n base)) f n
Wrap the whole thing in a newtype and we can have a Num
instance:
newtype Positional
= Positional
withBase :: Integer -> [Integer]
{
}
instance Num Positional where
Positional x + Positional y = Positional (carry <*> x <+> y)
Positional x * Positional y = Positional (carry <*> x <.> y)
fromInteger m = Positional (\base -> toBase base m)
abs = id
signum = id
negate = id
toDigits :: Integer -> Positional -> [Integer]
= reverse (withBase p base) toDigits base p
This also lets us choose our base after the fact:
= (sum . map fromInteger) [1..100]
sumHundred 10 sumHundred
toDigits -- [5,0,5,0]
2 sumHundred
toDigits -- [1,0,0,1,1,1,0,1,1,1,0,1,0]
Vectors
All the hand-optimizing, inlining, and fusion magic in the world won’t make a list-based implementation of convolution faster than a proper one on vectors, unfortunately. In particular, for larger vectors, a fast Fourier transform can be used. Also, usually code like this will be parallelized, rather than sequential. That said, it can be helpful to implement the slower version on vectors, in the usual indexed way, for comparison’s sake:
instance Semiring a =>
Semiring (Vector a) where
= Vector.singleton one
one = Vector.empty
zero <+> ys =
xs case compare (Vector.length xs) (Vector.length ys) of
EQ -> Vector.zipWith (<+>) xs ys
LT -> Vector.unsafeAccumulate (<+>) ys (Vector.indexed xs)
GT -> Vector.unsafeAccumulate (<+>) xs (Vector.indexed ys)
<.> kernel
signal | Vector.null signal = Vector.empty
| Vector.null kernel = Vector.empty
| otherwise = Vector.generate (slen + klen - 1) f
where
=
f n
foldl'->
(\a k <+>
a <.>
Vector.unsafeIndex signal k - k))
Vector.unsafeIndex kernel (n
zero.. kmax]
[kmin where
!kmin = max 0 (n - (klen - 1))
!kmax = min n (slen - 1)
!slen = Vector.length signal
!klen = Vector.length kernel
Search
As has been observed before (Rivas, Jaskelioff, and Schrijvers
2015) there’s a pretty suggestive similarity between semirings
and the Applicative
/Alternative
classes in Haskell:
class Semiring a where
one :: a
zero :: a
(<+>) :: a -> a -> a
(<.>) :: a -> a -> a
class Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
class Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
So can our implementation of convolution be used to implement the methods for these classes? Partially:
newtype Search f a = Search { runSearch :: [f a] }
instance Functor f => Functor (Search f) where
fmap f (Search xs) = Search ((fmap.fmap) f xs)
instance Alternative f => Applicative (Search f) where
pure x = Search [pure x]
<*> Search [] = Search []
_ Search xs <*> Search ys = Search (foldr f [] xs) where
= foldr (g x) id ys (empty : zs)
f x zs :zs) = (x <*> y <|> z) : a zs
g x y a (z= (x <*> y) : a []
g x y a []
instance Alternative f => Alternative (Search f) where
Search xs <|> Search ys = Search (go xs ys) where
= ys
go [] ys = xs
go xs [] :xs) (y:ys) = (x <|> y) : go xs ys
go (x= Search [] empty
At first, this seems perfect: the types all match up, and the
definitions seem sensible. The issue is with the laws: Applicative
and Alternative
are missing four that semirings require. In particular:
commutativity of plus, annihilation by zero, and distributivity left and
right:
<|> ys = ys <|> xs
xs <*> xs = fs <*> empty = empty
empty <*> (xs <|> ys) = fs <*> xs <|> fs <*> ys
fs <|> gs) <*> xs = fs <*> xs <|> gs <*> ys (fs
The vast majority of the instances of Alternative
today fail one or more of these laws. Taking lists as an example, ++
obviously
isn’t commutative, and <*>
only
distributes when it’s on the right.
What’s the problem, though? Polynomial multiplication follows
more laws than those required by Applicative
:
why should that worry us? Unfortunately, in order for multiplication to
follow those laws, it actually relies on the underlying semiring being
law-abiding. And it fails the applicative laws when it
isn’t.
There are two angles from which we could come at this problem: either
we relax the semiring laws and try and make our implementation of
convolution rely on them as little as possible, or we find Alternative
instances which follow the semiring laws. Or we could meet in the
middle, relaxing the laws as much as possible until we find some Alternative
s
that meet our standards.
This has actually been accomplished in several papers: the previously mentioned Rivas, Jaskelioff, and Schrijvers (2015) discusses near-semirings, defined as semiring-like structures with associativity, identity, and these two laws:
In contrast to normal semirings, zero only annihilates when it’s on the left, and multiplication only distributes over addition when it’s on the right. Addition is not required to be commutative.
The lovely paper Spivey (2009) has a similar concept: a “bunch”.
class Bunch m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
zero :: m a
(<|>) :: m a -> m a -> m a
wrap :: m a -> m a
The laws are all the same (with <*>
implemented in terms of >>=
),
and the extra wrap
operation can
be expressed like so:
wrap :: Alternative f => Search f a -> Search f a
Search xs) = Search (empty : xs) wrap (
A definition of >>=
for
our polynomials is also provided:
>>= _ = []
[] :xs) >>= f = foldr (<|>) empty (fmap f x) <|> wrap (xs >>= f) (x
This will require the underlying f
to be Foldable
. We
can inline a little, and express the whole thing as a fold:
instance (Foldable f, Alternative f) => Monad (Search f) where
Search xs >>= k = foldr f empty xs where
= foldr ((<|>) . k) (wrap a) e f e a
For Search
to meet
the requirements of a bunch, the paper notes that the f
must be assumed to be a bag, i.e.,
the order of its elements must be ignored.
Kiselyov et al. (2005) kind of goes the other direction, defining a monad which has fair disjunction and conjunction. Unfortunately, the fair conjunction loses associativity.
Distance
The end of the paper on algebras for combinatorial search wonders if
notions of distance could be added to some of the algebras. I
think that should be as simple as supplying a suitable
near-semiring for f
, but the
definition of >>=
would need to be changed. The near-semiring I had in mind was the
probability monad. It works correctly if inlined:
newtype Search s a = Search { runSearch :: [[(a,s)]] }
instance Functor (Search s) where
fmap f (Search xs) = Search ((fmap.fmap.first) f xs)
instance Semiring s => Applicative (Search s) where
pure x = Search [[(x,one)]]
<*> Search [] = Search []
_ Search xs <*> Search ys = Search (foldr f [] xs) where
= foldr (g x) id ys (empty : zs)
f x zs :zs) = (m x y ++ z) : a zs
g x y a (z= (m x y) : a []
g x y a [] = [(l r, lp<.>rp) | (l,lp) <- ls, (r,rp) <- rs]
m ls rs
instance Semiring s => Alternative (Search s) where
Search xs <|> Search ys = Search (go xs ys) where
= ys
go [] ys = xs
go xs [] :xs) (y:ys) = (x ++ y) : go xs ys
go (x= Search []
empty
wrap :: Search s a -> Search s a
Search xs) = Search ([] : xs)
wrap (
instance Semiring s => Monad (Search s) where
Search xs >>= k = foldr f empty xs where
= foldr ((<|>) . uncurry (mulIn . k)) (wrap a) e
f e a Search x) xp = Search ((fmap.fmap.fmap) (xp<.>) x) mulIn (
But I couldn’t figure out how to get it to work for a more generalized inner monad. The above could probably be sped up, or randomized, using the many well-known techniques for probability monad optimization.