Probability 5 Ways

Posted on June 30, 2018
Tags: Probability, Haskell

Ever since the famous pearl by Erwig and Kollmansberger (2006), probabilistic programming with monads has been an interesting and diverse area in functional programming, with many different approaches.

I’m going to present five here, some of which I have not seen before.

The Classic

As presented in the paper, a simple and elegant formulation of probability distributions looks like this:

newtype Prob a
    = Prob
    { runProb :: [(a, Rational)]
    }

It’s a list of possible events, each tagged with their probability of happening. Here’s the probability distribution representing a die roll, for instance:

die :: Prob Integer
die = [ (x, 1/6) | x <- [1..6] ]

The semantics can afford to be a little fuzzy: it doesn’t hugely matter if the probabilities don’t add up to 1 (you can still extract meaningful answers when they don’t). However, I can’t see a way in which either negative probabilities or an empty list would make sense. It would be nice if those states were unrepresentable.

Its monadic structure multiplies conditional events:

instance Functor Prob where
    fmap f xs = Prob [ (f x, p) | (x,p) <- runProb xs ]
    
instance Applicative Prob where
    pure x = Prob [(x,1)]
    fs <*> xs
        = Prob
        [ (f x,fp*xp)
        | (f,fp) <- runProb fs
        , (x,xp) <- runProb xs ]
                     
instance Monad Prob where
    xs >>= f
        = Prob
        [ (y,xp*yp)
        | (x,xp) <- runProb xs
        , (y,yp) <- runProb (f x) ]

In most of the examples, we’ll need a few extra functions in order for the types to be useful. First is support:

support :: Prob a -> [a]
support = fmap fst . runProb

And second is expectation:

expect :: (a -> Rational) -> Prob a -> Rational
expect p xs = sum [ p x * xp | (x,xp) <- runProb xs ]

probOf :: (a -> Bool) -> Prob a -> Rational
probOf p = expect (bool 0 1 . p)

It’s useful to be able to construct uniform distributions:

uniform xs = Prob [ (x,n) | x <- xs ]
  where
    n = 1 % toEnum (length xs)
    
die = uniform [1..6]

>>> probOf (7==) $ do
  x <- die
  y <- die
  pure (x+y)
1 % 6

The Bells and Whistles

As elegant as the above approach is, it leaves something to be desired when it comes to efficiency. In particular, you’ll see a combinatorial explosion at every step. To demonstrate, let’s take the example above, using three-sided dice instead so it doesn’t take up too much space.

die = uniform [1..3]

example = do
  x <- die
  y <- die
  pure (x+y)

The probability table looks like this:

2 1/9
3 2/9
4 1/3
5 2/9
6 1/9

But the internal representation looks like this:

2 1/9
3 1/9
4 1/9
3 1/9
4 1/9
5 1/9
4 1/9
5 1/9
6 1/9

States are duplicated, because the implementation has no way of knowing that two outcomes are the same. We could collapse equivalent outcomes if we used a Map, but then we can’t implement Functor, Applicative, or Monad. The types:

class Functor f where
    fmap :: (a -> b) -> f a -> f b

class Functor f => Applicative f where
    pure :: a -> f a
    (<*>) :: f (a -> b) -> f a -> f b

class Applicative f => Monad f where
    (>>=) :: f a -> (a -> f b) -> f b

Don’t allow an Ord constraint, which is what we’d need to remove duplicates. We can instead make our own classes which do allow constraints:

{-# LANGUAGE RebindableSyntax #-}
{-# LANGUAGE TypeFamilies     #-}

import Prelude hiding (Functor(..),Applicative(..),Monad(..))

import Data.Kind

class Functor f where
    type Domain f a :: Constraint
    type Domain f a = ()
    fmap :: Domain f b => (a -> b) -> f a -> f b

class Functor f => Applicative f where
    {-# MINIMAL pure, liftA2 #-}
    pure   :: Domain f a => a -> f a
    liftA2 :: Domain f c => (a -> b -> c) -> f a -> f b -> f c
    
    (<*>) :: Domain f b => f (a -> b) -> f a -> f b
    (<*>) = liftA2 ($) 

class Applicative f => Monad f where
    (>>=) :: Domain f b => f a -> (a -> f b) -> f b

fail :: String -> a
fail = error

return :: (Applicative f, Domain f a) => a -> f a
return = pure

This setup gets over a couple common annoyances in Haskell, like making Data.Set a Monad:

instance Functor Set where
    type Domain Set a = Ord a
    fmap = Set.map

instance Applicative Set where
    pure = Set.singleton
    liftA2 f xs ys = do
        x <- xs
        y <- ys
        pure (f x y)

instance Monad Set where
    (>>=) = flip foldMap

And, of course, the probability monad:

newtype Prob a = Prob
    { runProb :: Map a Rational
    }

instance Functor Prob where
    type Domain Prob a = Ord a
    fmap f = Prob . Map.mapKeysWith (+) f . runProb

instance Applicative Prob where
    pure x = Prob (Map.singleton x 1)
    liftA2 f xs ys = do
      x <- xs
      y <- ys
      pure (f x y)
      
instance Ord a => Monoid (Prob a) where
    mempty = Prob Map.empty
    mappend (Prob xs) (Prob ys) = Prob (Map.unionWith (+) xs ys)

instance Monad Prob where
    Prob xs >>= f
        = Map.foldMapWithKey ((Prob .) . flip (Map.map . (*)) . runProb . f) xs

support = Map.keys . runProb

expect p = getSum . Map.foldMapWithKey (\k v -> Sum (p k * v)) . runProb

probOf p = expect (bool 0 1 . p)

uniform xs = Prob (Map.fromList [ (x,n) | x <- xs ])
  where
    n = 1 % toEnum (length xs)

ifThenElse True t _ = t
ifThenElse False _ f = f

die = uniform [1..6]

>>> probOf (7==) $ do
  x <- die
  y <- die
  pure (x + y)
1 % 6

Free

Coming up with the right implementation all at once is quite difficult: luckily, there are more general techniques for designing DSLs that break the problem into smaller parts, which also give us some insight into the underlying composition of the probability monad.

The technique relies on an algebraic concept called “free objects”. A free object for some class is a minimal implementation of that class. The classic example is lists: they’re the free monoid. Monoid requires that you have an additive operation, an empty element, and that the additive operation be associative. Lists have all of these things: what makes them free, though, is that they have nothing else. For instance, the additive operation on lists (concatenation) isn’t commutative: if it was, they wouldn’t be the free monoid any more, because they satisfy an extra law that’s not in monoid.

For our case, we can use the free monad: this takes a functor and gives it a monad instance, in a way we know will satisfy all the laws. This encoding is used in several papers (Ścibior, Ghahramani, and Gordon 2015; Larsen 2011).

The idea is to first figure out what primitive operation you need. We’ll use weighted choice:

choose :: Prob a -> Rational -> Prob a -> Prob a
choose = ...

Then you encode it as a functor:

data Choose a
    = Choose Rational a a
    deriving (Functor,Foldable)

We’ll say the left-hand-choice has chance pp, and the right-hand 1p1-p. Then, you just wrap it in the free monad:

type Prob = Free Choose

And you already have a monad instance. Support comes from the Foldable instance:

import Data.Foldable

support :: Prob a -> [a]
support = toList

Expectation is an “interpreter” for the DSL:

expect :: (a -> Rational) -> Prob a -> Rational
expect p = iter f . fmap p
  where
    f (Choose c l r) = l * c + r * (1-c)

For building up the tree, we can use Huffman’s algorithm:

fromList :: (a -> Rational) -> [a] -> Prob a
fromList p = go . foldMap (\x -> singleton (p x) (Pure x))
  where
    go xs = case minView xs of
      Nothing -> error "empty list"
      Just ((xp,x),ys) -> case minView ys of
        Nothing -> x
        Just ((yp,y),zs) ->
          go (insertHeap (xp+yp) (Free (Choose (xp/(xp+yp)) x y)) zs)

And finally, it gets the same notation as before:

uniform = fromList (const 1)

die = uniform [1..6]

probOf p = expect (bool 0 1 . p)

>>> probOf (7==) $ do
  x <- die
  y <- die
  pure (x + y)
1 % 6

One of the advantages of the free approach is that it’s easy to define multiple interpreters. We could, for instance, write an interpreter that constructs a diagram:

>>> drawTree ((,) <$> uniform "abc" <*> uniform "de")
           ┌('c','d')
1 % 2┤
     │     └('c','e')
1 % 3┤
     │           ┌('a','d')
     │     ┌1 % 2┤
     │     │     └('a','e')
1 % 2┤
           │     ┌('b','d')
1 % 2┤
                 └('b','e')

Final

There’s a lot to be said about free objects in category theory, also. Specifically, they’re related to initial and terminal (also called final) objects. The encoding above is initial, the final encoding is simply Cont:

newtype Cont r a = Cont { runCont :: (a -> r) -> r }

type Prob = Cont Rational

Here, also, we get the monad instance for free. In contrast to previously, expect is free:

expect = flip runCont

Support, though, isn’t possible.

This version is also called the Giry monad: there’s a deep and fascinating theory behind it, which I probably won’t be able to do justice to here. Check out Jared Tobin’s post (2017) for a good deep dive on it.

Cofree

The branching structure of the tree captures the semantics of the probability monad well, but it doesn’t give us much insight into the original implementation. The question is, how can we deconstruct this:

newtype Prob a
    = Prob
    { runProb :: [(a, Rational)]
    }

Eric Kidd (2007) pointed out that the monad is the composition of the writer and list monads:

type Prob = WriterT (Product Rational) []

but that seems unsatisfying: in contrast to the tree-based version, we don’t encode any branching structure, we’re able to have empty distributions, and it has the combinatorial explosion problem.

Adding a weighting to nondeterminism is encapsulated more concretely by the ListT transformer. It looks like this:

newtype ListT m a
    = ListT
    { runListT :: m (Maybe (a, ListT m a))
    }

It’s a cons-list, with an effect before every layer1.

While this can be used to give us the monad we need, I’ve found that something more like this fits the abstraction better:

data ListT m a
    = ListT a (m (Maybe (ListT m a)))

It’s a nonempty list, with the first element exposed. Turns out this is very similar to the cofree comonad:

data Cofree f a = a :< f (Cofree f a)

Just like the initial free encoding, we can start with a primitive operation:

data Perhaps a
    = Impossible
    | WithChance Rational a
    deriving (Functor,Foldable)

And we get all of our instances as well:

newtype Prob a
    = Prob
    { runProb :: Cofree Perhaps a
    } deriving (Functor,Foldable)
    
instance Comonad Prob where
    extract (Prob xs) = extract xs
    duplicate (Prob xs) = Prob (fmap Prob (duplicate xs))

foldProb :: (a -> Rational -> b -> b) -> (a -> b) -> Prob a -> b
foldProb f b = r . runProb
  where
    r (x :< Impossible) = b x
    r (x :< WithChance p xs) = f x p (r xs)

uniform :: [a] -> Prob a
uniform (x:xs) = Prob (coiterW f (EnvT (length xs) (x :| xs)))
  where
    f (EnvT 0 (_ :| [])) = Impossible
    f (EnvT n (_ :| (y:ys))) 
        = WithChance (1 % fromIntegral n) (EnvT (n - 1) (y:|ys))

expect :: (a -> Rational) -> Prob a -> Rational
expect p = foldProb f p
  where
    f x n xs = (p x * n + xs) / (n + 1)

probOf :: (a -> Bool) -> Prob a -> Rational
probOf p = expect (\x -> if p x then 1 else 0)

instance Applicative Prob where
    pure x = Prob (x :< Impossible)
    (<*>) = ap
    
append :: Prob a -> Rational -> Prob a -> Prob a
append = foldProb f (\x y ->  Prob . (x :<) . WithChance y . runProb)
  where
    f e r a p = Prob . (e :<) . WithChance ip . runProb . a op
      where
        ip = p * r / (p + r + 1)
        op = p / (r + 1)

instance Monad Prob where
    xs >>= f = foldProb (append . f) f xs

We see here that we’re talking about gambling-style odds, rather than probability. I wonder if the two representations are dual somehow?

The application of comonads to streams (ListT) has been explored before (Uustalu and Vene 2005); I wonder if there are any insights to be gleaned from this particular probability comonad.

References

Erwig, Martin, and Steve Kollmansberger. 2006. “Functional pearls: Probabilistic functional programming in Haskell.” Journal of Functional Programming 16 (1): 21–34. doi:10.1017/S0956796805005721. http://web.engr.oregonstate.edu/~erwig/papers/abstracts.html\#JFP06a.

Kidd, Eric. 2007. “Build your own probability monads.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.9502\&rep=rep1\&type=pdf.

Larsen, Ken Friis. 2011. “Memory Efficient Implementation of Probability Monads.” http://www.diku.dk/~kflarsen/t/ProbMonad-unpublished.pdf.

Ścibior, Adam, Zoubin Ghahramani, and Andrew D. Gordon. 2015. “Practical Probabilistic Programming with Monads.” In Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell, 50:165–176. Haskell ’15. New York, NY, USA: ACM. doi:10.1145/2804302.2804317. http://mlg.eng.cam.ac.uk/pub/pdf/SciGhaGor15.pdf.

Tobin, Jared. 2017. “Implementing the Giry Monad.” jtobin.io. https://jtobin.io/giry-monad-implementation.

Uustalu, Tarmo, and Varmo Vene. 2005. “The Essence of Dataflow Programming.” In Proceedings of the Third Asian Conference on Programming Languages and Systems, 2–18. APLAS’05. Berlin, Heidelberg: Springer-Verlag. doi:10.1007/11575467_2. http://dx.doi.org/10.1007/11575467_2.


  1. Note this is not the same as the ListT in transformers; instead it’s a “ListT done right”.