Convolutions and Semirings

Posted on October 13, 2017

Tags: Haskell, Semirings

I have been working a little more on my semirings library recently, and I have come across some interesting functions in the process. First, a quick recap on the Semiring class and some related functions:

class Semiring a where
  one :: a
  zero :: a
  infixl 6 <+>
  (<+>) :: a -> a -> a
  infixl 7 <.>
  (<.>) :: a -> a -> a

add :: (Foldable f, Semiring a) => f a -> a
add = foldl' (<+>) zero

mul :: (Foldable f, Semiring a) => f a -> a
mul = foldl' (<.>) one

instance Semiring Integer where
  one = 1
  zero = 0
  (<+>) = (+)
  (<.>) = (*)

instance Semiring Bool where
  one = True
  zero = False
  (<+>) = (||)
  (<.>) = (&&)

You can think of it as a replacement for Num, but it turns out to be much more generally useful than that.

Matrix Multiplication

The first interesting function is to do with matrix multiplication. Here’s the code for multiplying two matrices represented as nested lists:

mulMatrix :: Semiring a => [[a]] -> [[a]] -> [[a]]
mulMatrix xs ys = map (\row -> map (add . zipWith (<.>) row) cs) xs
  where
    cs = transpose ys

One of the issues with this code (other than its woeful performance) is that it seems needlessly list-specific. zipWith seems like the kind of thing that exists on a bunch of different structures. Indeed, the ZipList wrapper uses zipWith as its <*> implementation. Let’s try that for now:

mulMatrix :: (Semiring a, Applicative f) => f (f a) -> f (f a) -> f (f a)
mulMatrix xs ys = fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
  where
    cs = transpose ys

Of course, now add needs to work on our f, so it should be Foldable

mulMatrix 
  :: (Semiring a, Applicative f, Foldable f) 
  => f (f a) -> f (f a) -> f (f a)
mulMatrix = ...

transpose is the missing piece now. A little bit of Applicative magic can help us out again, though: sequenceA is transpose on ZipLists (McBride and Paterson 2008).

mulMatrix 
  :: (Semiring a, Applicative f, Traversable f) 
  => f (f a) -> f (f a) -> f (f a)
mulMatrix xs ys = 
    fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
  where
    cs = sequenceA ys

One further generalization: The two fs don’t actually need to be the same:

mulMatrix
    :: (Applicative n
       ,Traversable m
       ,Applicative m
       ,Applicative p
       ,Semiring a)
    => n (m a) -> m (p a) -> n (p a)
mulMatrix xs ys = fmap (\row -> fmap (add . liftA2 (<.>) row) cs) xs
  where
    cs = sequenceA ys

Happily, the way that the wrappers (n, m, and p) match up coincides precisely with how matrix dimensions match up in matrix multiplication. Quoting from the Wikipedia definition:

if $A$ is an $n \times m$ matrix and $B$ is an $m \times p$ matrix, their matrix product $AB$ is an $n \times p$ matrix

This function is present in the linear package with some different constraints. In fairness, Applicative probably isn’t the best thing to use here since it doesn’t work for so many instances (MonadZip or something similar may be more suitable), but it’s very handy to have, and works out-of the box for types like:

data Three a 
  = Three a a a 
  deriving (Functor, Foldable, Traversable, Eq, Ord, Show)

instance Applicative Three where
  pure x = Three x x x
  Three fx fy fz <*> Three xx xy xz = Three (fx xx) (fy xy) (fz xz)

Which makes it (to my mind) useful enough to keep. Also, it hugely simplified the code for matrix multiplication in square matrices I had, from Okasaki (1999).

Convolutions

If you’re putting a general class in a library that you want people to use, and there exist sensible instances for common Haskell types, you should probably provide those instances in the library to avoid orphans. The meaning of “sensible” here is vague: generally speaking, if there is only one obvious or clear instance, then it’s sensible. For a list instance for the semiring class, for instance, I could figure out several law-abiding definitions for <+>, one and zero, but only one for <.>: polynomial multiplication. You know, where you multiply two polynomials like so:

$(x^3 + 2x + 3)(5x + 3x^2 + 4) = 9x^5 + 15x^4 + 18x^3 + 28x^2 + 38x + 24$

A more general definition looks something like this:

$(a_0x^0 + a_1x^1 + a_2x^2)(b_0x^0 + b_1x^1 + b_2x^2) =$ $a_0b_0x^0 + (a_0b_1 + a_1b_0)x^1 + (a_0b_2 + a_1b_1 + a_2b_0)x^2 + (a_1b_2 + a_2b_1)x^3 + a_2b_2x^4$

Or, fully generalized:

$c_k = a_0b_k + a_1b_{k-1} + \ldots + a_{k-1}b_1 + a_kb_0$ $f(x) \times g(x) = \sum_{i=0}^{n+m}c_ix^i$

So it turns out that you can represent polynomials pretty elegantly as lists. Take an example from above:

$x^3 + 2x + 3$

And rearrange it in order of the powers of $x$ :

$3x^0 + 2x^1 + x^3$

And fill in missing coefficients:

$3x^0 + 2x^1 + 0x^2 + 1x^3$

And then the list representation of that polynomial is the list of those coefficients:

[3, 2, 0, 1]

For me, the definitions of multiplication above were pretty hard to understand. In Haskell, however, the definition is quite beautiful:

instance Semiring a => Semiring [a] where
  one = [one]
  zero = []
  [] <+> ys = ys
  xs <+> [] = xs
  (x:xs) <+> (y:ys) = x <+> y : (xs <+> ys)
  _ <.> [] = []
  [] <.> _ = []
  (x:xs) <.> (y:ys) = (x<.>y) : map (x<.>) ys <+> xs <.> (y:ys)

This definition for <.> can be found on page 4 of McIlroy (1999). Although there was a version of the paper with a slightly different definition:

_ <.> [] = []
[] <.> _ = []
(x:xs) <.> (y:ys) 
  = (x<.>y) : (map (x<.>) ys <+> map (<.>y) xs <+> (zero : (xs <.> ys)))

Similar to one which appeared in Dolan (2013).

As it happens, I prefer the first definition. It’s shorter, and I figured out how to write it as a fold:

_ <.> [] = []
xs <.> ys = foldr f [] xs where
  f x zs = map (x <.>) ys <+> (zero : zs)

And if you inline the <+>, you get a reasonable speedup:

xs <.> ys = foldr f [] xs
  where
    f x zs = foldr (g x) id ys (zero : zs)
    g x y a (z:zs) = x <.> y <+> z : a zs
    g x y a [] = x <.> y : a []

The definition of <+> can also use a fold on either side for fusion purposes:

(<+>) = foldr f id where
  f x xs (y:ys) = x <+> y : xs ys
  f x xs [] = x : xs []

(<+>) = flip (foldr f id) where
  f y ys (x:xs) = x <+> y : ys xs
  f y ys [] = y : ys []

There are rules in the library to choose one of the above definitions if fusion is available.

This definition is much more widely useful than it may seem at first. Say, for instance, you wanted to search through pairs of things from two infinite lists. You can’t use the normal way to pair things for lists, the Cartesian product, because it will diverge:

[(x,y) | x <- [1..], y <- [1..]]
-- [(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),(1,8),(1,9),(1,10)...

You’ll never get beyond 1 in the first list. Zipping isn’t an option either, because you won’t really explore the search space, only corresponding pairs. Brent Yorgey showed that if you want a list like this:

[(y,x-y) | x <- [0..], y <- [0..x] ]
-- [(0,0),(0,1),(1,0),(0,2),(1,1),(2,0),(0,3),(1,2),(2,1),(3,0)...

Then what you’re looking for is a convolution (the same thing as polynomial multiplication). <.> above can be adapted readily:

convolve :: [a] -> [b] -> [[(a,b)]]
convolve xs ys = foldr f [] xs
  where
    f x zs = foldr (g x) id ys ([] : zs)
    g x y a (z:zs) = ((x, y) : z) : a zs
    g x y a [] = [(x, y)] : a []

Flatten out this result to get your ordering. This convolution is a little different from the one in the blog post. By inlining <+> we can avoid the expensive ++ function, without using difference lists.

Long Multiplication

Here’s another cool use of lists as polynomials: they can be used as a positional numeral system. Most common numeral systems are positional, including Arabic (the system you most likely use, where twenty-four is written as 24) and binary. Non-positional systems are things like Roman numerals. Looking at the Arabic system for now, we see that the way of writing down numbers:

$1989$

Can be thought of the sum of each digit multiplied by ten to the power of its position:

$1989 = 1 \times 10^3 \plus 9 \times 10^2 \plus 8 \times 10^1 \plus 9 \times 10^0$ $1989 = 1 \times 1000 \plus 9 \times 100 \plus 8 \times 10 \plus 9 \times 1$ $1989 = 1000 \plus 900 \plus 80 \plus 9$ $1989 = 1989$

Where the positions are numbered from the right. In other words, it’s our polynomial list from above in reverse. As well as that, the convolution is long multiplication.

Now, taking this straight off we can try some examples:

-- 12 + 15 = 27
[2, 1] <+> [5, 1] == [7, 2]

-- 23 * 2 = 46
[3, 2] <.> [2] == [6, 4]

The issue, of course, is that we’re not handling carrying properly:

[6] <+> [6] == [12]

No matter: we can perform all the carries after the addition, and everything works out fine:

carry
    :: Integral a
    => a -> [a] -> [a]
carry base xs = foldr f (toBase base) xs 0
  where
    f e a cin = r : a q where
      (q,r) = quotRem (cin + e) base
        
toBase :: Integral a => a -> a -> [a]
toBase base = unfoldr f where
  f 0 = Nothing
  f n = Just (swap (quotRem n base))

Wrap the whole thing in a newtype and we can have a Num instance:

newtype Positional 
  = Positional 
  { withBase :: Integer -> [Integer] 
  } 

instance Num Positional where
  Positional x + Positional y = Positional (carry <*> x <+> y)
  Positional x * Positional y = Positional (carry <*> x <.> y)
  fromInteger m = Positional (\base -> toBase base m)
  abs = id
  signum = id
  negate = id
  
toDigits :: Integer -> Positional -> [Integer]
toDigits base p = reverse (withBase p base)

This also lets us choose our base after the fact:

sumHundred = (sum . map fromInteger) [1..100]
toDigits 10 sumHundred
-- [5,0,5,0]
toDigits 2 sumHundred
-- [1,0,0,1,1,1,0,1,1,1,0,1,0]

Vectors

All the hand-optimizing, inlining, and fusion magic in the world won’t make a list-based implementation of convolution faster than a proper one on vectors, unfortunately. In particular, for larger vectors, a fast Fourier transform can be used. Also, usually code like this will be parallelized, rather than sequential. That said, it can be helpful to implement the slower version on vectors, in the usual indexed way, for comparison’s sake:

instance Semiring a =>
         Semiring (Vector a) where
    one = Vector.singleton one
    zero = Vector.empty
    xs <+> ys =
        case compare (Vector.length xs) (Vector.length ys) of
            EQ -> Vector.zipWith (<+>) xs ys
            LT -> Vector.unsafeAccumulate (<+>) ys (Vector.indexed xs)
            GT -> Vector.unsafeAccumulate (<+>) xs (Vector.indexed ys)
    signal <.> kernel
      | Vector.null signal = Vector.empty
      | Vector.null kernel = Vector.empty
      | otherwise = Vector.generate (slen + klen - 1) f
      where
        f n =
            foldl'
                (\a k ->
                      a <+>
                      Vector.unsafeIndex signal k <.>
                      Vector.unsafeIndex kernel (n - k))
                zero
                [kmin .. kmax]
          where
            !kmin = max 0 (n - (klen - 1))
            !kmax = min n (slen - 1)
        !slen = Vector.length signal
        !klen = Vector.length kernel

Search

As has been observed before (Rivas, Jaskelioff, and Schrijvers 2015) there’s a pretty suggestive similarity between semirings and the Applicative/Alternative classes in Haskell:

class Semiring a where
  one :: a
  zero :: a
  (<+>) :: a -> a -> a
  (<.>) :: a -> a -> a

class Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

class Alternative f where
  empty :: f a
  (<|>) :: f a -> f a -> f a

So can our implementation of convolution be used to implement the methods for these classes? Partially:

newtype Search f a = Search { runSearch :: [f a] }

instance Functor f => Functor (Search f) where
  fmap f (Search xs) = Search ((fmap.fmap) f xs)

instance Alternative f => Applicative (Search f) where
  pure x = Search [pure x]
  _ <*> Search [] = Search []
  Search xs <*> Search ys = Search (foldr f [] xs) where
    f x zs = foldr (g x) id ys (empty : zs)
    g x y a (z:zs) = (x <*> y <|> z) : a zs
    g x y a [] = (x <*> y) : a []

instance Alternative f => Alternative (Search f) where
  Search xs <|> Search ys = Search (go xs ys) where
    go [] ys = ys
    go xs [] = xs
    go (x:xs) (y:ys) = (x <|> y) : go xs ys
  empty = Search []

At first, this seems perfect: the types all match up, and the definitions seem sensible. The issue is with the laws: Applicative and Alternative are missing four that semirings require. In particular: commutativity of plus, annihilation by zero, and distributivity left and right:

xs <|> ys = ys <|> xs
empty <*> xs = fs <*> empty = empty
fs <*> (xs <|> ys) = fs <*> xs <|> fs <*> ys
(fs <|> gs) <*> xs = fs <*> xs <|> gs <*> ys

The vast majority of the instances of Alternative today fail one or more of these laws. Taking lists as an example, ++ obviously isn’t commutative, and <*> only distributes when it’s on the right.

What’s the problem, though? Polynomial multiplication follows more laws than those required by Applicative: why should that worry us? Unfortunately, in order for multiplication to follow those laws, it actually relies on the underlying semiring being law-abiding. And it fails the applicative laws when it isn’t.

There are two angles from which we could come at this problem: either we relax the semiring laws and try and make our implementation of convolution rely on them as little as possible, or we find Alternative instances which follow the semiring laws. Or we could meet in the middle, relaxing the laws as much as possible until we find some Alternatives that meet our standards.

This has actually been accomplished in several papers: the previously mentioned Rivas, Jaskelioff, and Schrijvers (2015) discusses near-semirings, defined as semiring-like structures with associativity, identity, and these two laws:

$0 \times x = 0$ $(x \plus y) \times z = (x \times z) \plus (y \times z)$

In contrast to normal semirings, zero only annihilates when it’s on the left, and multiplication only distributes over addition when it’s on the right. Addition is not required to be commutative.

The lovely paper Spivey (2009) has a similar concept: a “bunch”.

class Bunch m where
  return :: a -> m a
  (>>=) :: m a -> (a -> m b) -> m b
  zero :: m a
  (<|>) :: m a -> m a -> m a
  wrap :: m a -> m a

The laws are all the same (with <*> implemented in terms of >>=), and the extra wrap operation can be expressed like so:

wrap :: Alternative f => Search f a -> Search f a
wrap (Search xs) = Search (empty : xs)

A definition of >>= for our polynomials is also provided:

[] >>= _ = []
(x:xs) >>= f = foldr (<|>) empty (fmap f x) <|> wrap (xs >>= f)

This will require the underlying f to be Foldable. We can inline a little, and express the whole thing as a fold:

instance (Foldable f, Alternative f) => Monad (Search f) where
  Search xs >>= k = foldr f empty xs where
    f e a = foldr ((<|>) . k) (wrap a) e

For Search to meet the requirements of a bunch, the paper notes that the f must be assumed to be a bag, i.e., the order of its elements must be ignored.

Kiselyov et al. (2005) kind of goes the other direction, defining a monad which has fair disjunction and conjunction. Unfortunately, the fair conjunction loses associativity.

Distance

The end of the paper on algebras for combinatorial search wonders if notions of distance could be added to some of the algebras. I think that should be as simple as supplying a suitable near-semiring for f, but the definition of >>= would need to be changed. The near-semiring I had in mind was the probability monad. It works correctly if inlined:

newtype Search s a = Search { runSearch :: [[(a,s)]] }

instance Functor (Search s) where
  fmap f (Search xs) = Search ((fmap.fmap.first) f xs)

instance Semiring s => Applicative (Search s) where
  pure x = Search [[(x,one)]]
  _ <*> Search [] = Search []
  Search xs <*> Search ys = Search (foldr f [] xs) where
    f x zs = foldr (g x) id ys (empty : zs)
    g x y a (z:zs) = (m x y ++ z) : a zs
    g x y a [] = (m x y) : a []
    m ls rs = [(l r, lp<.>rp) | (l,lp) <- ls, (r,rp) <- rs]

instance Semiring s => Alternative (Search s) where
  Search xs <|> Search ys = Search (go xs ys) where
    go [] ys = ys
    go xs [] = xs
    go (x:xs) (y:ys) = (x ++ y) : go xs ys
  empty = Search []

wrap :: Search s a -> Search s a
wrap (Search xs) = Search ([] : xs)

instance Semiring s => Monad (Search s) where
  Search xs >>= k = foldr f empty xs where
    f e a = foldr ((<|>) . uncurry (mulIn . k)) (wrap a) e
    mulIn (Search x) xp = Search ((fmap.fmap.fmap) (xp<.>) x)

But I couldn’t figure out how to get it to work for a more generalized inner monad. The above could probably be sped up, or randomized, using the many well-known techniques for probability monad optimization.

References

Dolan, Stephen. 2013. “Fun with semirings: A functional pearl on the abuse of linear algebra.” In, 48:101. ACM Press. doi:10.1145/2500365.2500613. https://www.cl.cam.ac.uk/~sd601/papers/semirings.pdf.

Kiselyov, Oleg, Chung-chieh Shan, Daniel P Friedman, and Amr Sabry. 2005. “Backtracking, interleaving, and terminating monad transformers (functional pearl).” ACM SIGPLAN Notices 40 (9): 192–203. http://okmij.org/ftp/Computation/monads.html#LogicT.

McBride, Conor, and Ross Paterson. 2008. “Applicative programming with effects.” Journal of functional programming 18 (01): 1–13. http://strictlypositive.org/Idiom.pdf.

McIlroy, M. Douglas. 1999. “Power Series, Power Serious.” J. Funct. Program. 9 (3) (May): 325–337. doi:10.1017/S0956796899003299. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.333.3156&rep=rep1&type=pdf.

Okasaki, Chris. 1999. “From Fast Exponentiation to Square Matrices: An Adventure in Types.” In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’99), Paris, France, September 27-29, 1999, 34:28. ACM. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.456.357&rep=rep1&type=pdf.

Rivas, Exequiel, Mauro Jaskelioff, and Tom Schrijvers. 2015. “From monoids to near-semirings: The essence of MonadPlus and Alternative.” In Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming, 196–207. ACM. doi:10.1145/2790449.2790514. http://www.fceia.unr.edu.ar/~mauro/pubs/FromMonoidstoNearsemirings.pdf.

Spivey, J. Michael. 2009. “Algebras for combinatorial search.” Journal of Functional Programming 19 (3-4) (July): 469–487. doi:10.1017/S0956796809007321. https://pdfs.semanticscholar.org/db3e/373bb6e7e7837ebc524da0a25903958554ed.pdf.