## Probability 5 Ways

Ever since the famous pearl by Erwig and Kollmansberger (2006), probabilistic programming with monads has been an interesting and diverse area in functional programming, with many different approaches.

I’m going to present five here, some of which I have not seen before.

# The Classic

As presented in the paper, a simple and elegant formulation of probability distributions looks like this:

```
newtype Prob a
= Prob
runProb :: [(a, Rational)]
{ }
```

It’s a list of possible events, each tagged with their probability of happening. Here’s the probability distribution representing a die roll, for instance:

```
die :: Prob Integer
= [ (x, 1/6) | x <- [1..6] ] die
```

The semantics can afford to be a little fuzzy: it doesn’t hugely matter if the probabilities don’t add up to 1 (you can still extract meaningful answers when they don’t). However, I can’t see a way in which either negative probabilities or an empty list would make sense. It would be nice if those states were unrepresentable.

Its monadic structure multiplies conditional events:

```
instance Functor Prob where
fmap f xs = Prob [ (f x, p) | (x,p) <- runProb xs ]
instance Applicative Prob where
pure x = Prob [(x,1)]
<*> xs
fs = Prob
*xp)
[ (f x,fp| (f,fp) <- runProb fs
<- runProb xs ]
, (x,xp)
instance Monad Prob where
>>= f
xs = Prob
*yp)
[ (y,xp| (x,xp) <- runProb xs
<- runProb (f x) ] , (y,yp)
```

In most of the examples, we’ll need a few extra functions in order for the types to be useful. First is support:

```
support :: Prob a -> [a]
= fmap fst . runProb support
```

And second is expectation:

```
expect :: (a -> Rational) -> Prob a -> Rational
= sum [ p x * xp | (x,xp) <- runProb xs ]
expect p xs
probOf :: (a -> Bool) -> Prob a -> Rational
= expect (bool 0 1 . p) probOf p
```

It’s useful to be able to construct uniform distributions:

```
= Prob [ (x,n) | x <- xs ]
uniform xs where
= 1 % toEnum (length xs)
n
= uniform [1..6]
die
>>> probOf (7==) $ do
<- die
x <- die
y pure (x+y)
1 % 6
```

# The Bells and Whistles

As elegant as the above approach is, it leaves something to be desired when it comes to efficiency. In particular, you’ll see a combinatorial explosion at every step. To demonstrate, let’s take the example above, using three-sided dice instead so it doesn’t take up too much space.

```
= uniform [1..3]
die
= do
example <- die
x <- die
y pure (x+y)
```

The probability table looks like this:

```
2 1/9
3 2/9
4 1/3
5 2/9
6 1/9
```

But the internal representation looks like this:

```
2 1/9
3 1/9
4 1/9
3 1/9
4 1/9
5 1/9
4 1/9
5 1/9
6 1/9
```

States are duplicated, because the implementation has no way of
knowing that two outcomes are the same. We could collapse equivalent
outcomes if we used a `Map`

, but then
we can’t implement `Functor`

, `Applicative`

,
or `Monad`

. The
types:

```
class Functor f where
fmap :: (a -> b) -> f a -> f b
class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
class Applicative f => Monad f where
(>>=) :: f a -> (a -> f b) -> f b
```

Don’t allow an `Ord`

constraint, which is what we’d need to remove duplicates. We can instead
make our own classes which *do* allow constraints:

```
{-# LANGUAGE RebindableSyntax #-}
{-# LANGUAGE TypeFamilies #-}
import Prelude hiding (Functor(..),Applicative(..),Monad(..))
import Data.Kind
class Functor f where
type Domain f a :: Constraint
type Domain f a = ()
fmap :: Domain f b => (a -> b) -> f a -> f b
class Functor f => Applicative f where
{-# MINIMAL pure, liftA2 #-}
pure :: Domain f a => a -> f a
liftA2 :: Domain f c => (a -> b -> c) -> f a -> f b -> f c
(<*>) :: Domain f b => f (a -> b) -> f a -> f b
<*>) = liftA2 ($)
(
class Applicative f => Monad f where
(>>=) :: Domain f b => f a -> (a -> f b) -> f b
fail :: String -> a
fail = error
return :: (Applicative f, Domain f a) => a -> f a
return = pure
```

This setup gets over a couple common annoyances in Haskell, like
making `Data.Set`

a
Monad:

```
instance Functor Set where
type Domain Set a = Ord a
fmap = Set.map
instance Applicative Set where
pure = Set.singleton
= do
liftA2 f xs ys <- xs
x <- ys
y pure (f x y)
instance Monad Set where
>>=) = flip foldMap (
```

And, of course, the probability monad:

```
newtype Prob a = Prob
runProb :: Map a Rational
{
}
instance Functor Prob where
type Domain Prob a = Ord a
fmap f = Prob . Map.mapKeysWith (+) f . runProb
instance Applicative Prob where
pure x = Prob (Map.singleton x 1)
= do
liftA2 f xs ys <- xs
x <- ys
y pure (f x y)
instance Ord a => Monoid (Prob a) where
mempty = Prob Map.empty
mappend (Prob xs) (Prob ys) = Prob (Map.unionWith (+) xs ys)
instance Monad Prob where
Prob xs >>= f
= Map.foldMapWithKey ((Prob .) . flip (Map.map . (*)) . runProb . f) xs
= Map.keys . runProb
support
= getSum . Map.foldMapWithKey (\k v -> Sum (p k * v)) . runProb
expect p
= expect (bool 0 1 . p)
probOf p
= Prob (Map.fromList [ (x,n) | x <- xs ])
uniform xs where
= 1 % toEnum (length xs)
n
True t _ = t
ifThenElse False _ f = f
ifThenElse
= uniform [1..6]
die
>>> probOf (7==) $ do
<- die
x <- die
y pure (x + y)
1 % 6
```

# Free

Coming up with the right implementation all at once is quite difficult: luckily, there are more general techniques for designing DSLs that break the problem into smaller parts, which also give us some insight into the underlying composition of the probability monad.

The technique relies on an algebraic concept called “free objects”. A
free object for some class is a minimal implementation of that class.
The classic example is lists: they’re the free monoid. Monoid requires
that you have an additive operation, an empty element, and that the
additive operation be associative. Lists have all of these things: what
makes them *free*, though, is that they have nothing else. For
instance, the additive operation on lists (concatenation) isn’t
commutative: if it was, they wouldn’t be the free monoid any more,
because they satisfy an extra law that’s not in monoid.

For our case, we can use the free monad: this takes a functor and gives it a monad instance, in a way we know will satisfy all the laws. This encoding is used in several papers (Ścibior, Ghahramani, and Gordon 2015; Larsen 2011).

The idea is to first figure out what primitive operation you need. We’ll use weighted choice:

```
choose :: Prob a -> Rational -> Prob a -> Prob a
= ... choose
```

Then you encode it as a functor:

```
data Choose a
= Choose Rational a a
deriving (Functor,Foldable)
```

We’ll say the left-hand-choice has chance $p$, and the right-hand $1-p$. Then, you just wrap it in the free monad:

`type Prob = Free Choose`

And you already have a monad instance. Support comes from the `Foldable`

instance:

```
import Data.Foldable
support :: Prob a -> [a]
= toList support
```

Expectation is an “interpreter” for the DSL:

```
expect :: (a -> Rational) -> Prob a -> Rational
= iter f . fmap p
expect p where
Choose c l r) = l * c + r * (1-c) f (
```

For building up the tree, we can use Huffman’s algorithm:

```
fromList :: (a -> Rational) -> [a] -> Prob a
= go . foldMap (\x -> singleton (p x) (Pure x))
fromList p where
= case minView xs of
go xs Nothing -> error "empty list"
Just ((xp,x),ys) -> case minView ys of
Nothing -> x
Just ((yp,y),zs) ->
+yp) (Free (Choose (xp/(xp+yp)) x y)) zs) go (insertHeap (xp
```

And finally, it gets the same notation as before:

```
= fromList (const 1)
uniform
= uniform [1..6]
die
= expect (bool 0 1 . p)
probOf p
>>> probOf (7==) $ do
<- die
x <- die
y pure (x + y)
1 % 6
```

One of the advantages of the free approach is that it’s easy to define multiple interpreters. We could, for instance, write an interpreter that constructs a diagram:

```
>>> drawTree ((,) <$> uniform "abc" <*> uniform "de")
'c','d')
┌(1 % 2┤
┌'c','e')
│ └(1 % 3┤
'a','d')
│ ┌(1 % 2┤
│ ┌'a','e')
│ │ └(1 % 2┤
└'b','d')
│ ┌(1 % 2┤
└'b','e') └(
```

# Final

There’s a lot to be said about free objects in category theory, also.
Specifically, they’re related to initial and terminal (also called
final) objects. The encoding above is initial, the final encoding is
simply `Cont`

:

```
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
type Prob = Cont Rational
```

Here, also, we get the monad instance for free. In contrast to previously, expect is free:

`= flip runCont expect `

Support, though, isn’t possible.

This version is also called the Giry monad: there’s a deep and fascinating theory behind it, which I probably won’t be able to do justice to here. Check out Jared Tobin’s post -Tobin (2017) for a good deep dive on it.

# Cofree

The branching structure of the tree captures the semantics of the probability monad well, but it doesn’t give us much insight into the original implementation. The question is, how can we deconstruct this:

```
newtype Prob a
= Prob
runProb :: [(a, Rational)]
{ }
```

Eric Kidd -Kidd (2007) pointed out that the monad is the composition of the writer and list monads:

`type Prob = WriterT (Product Rational) []`

but that seems unsatisfying: in contrast to the tree-based version, we don’t encode any branching structure, we’re able to have empty distributions, and it has the combinatorial explosion problem.

Adding a weighting to nondeterminism is encapsulated more concretely
by the `ListT`

transformer. It looks like this:

```
newtype ListT m a
= ListT
runListT :: m (Maybe (a, ListT m a))
{ }
```

It’s a cons-list, with an effect before every layer^{1}.

While this can be used to give us the monad we need, I’ve found that something more like this fits the abstraction better:

```
data ListT m a
= ListT a (m (Maybe (ListT m a)))
```

It’s a nonempty list, with the first element exposed. Turns out this is very similar to the cofree comonad:

`data Cofree f a = a :< f (Cofree f a)`

Just like the initial free encoding, we can start with a primitive operation:

```
data Perhaps a
= Impossible
| WithChance Rational a
deriving (Functor,Foldable)
```

And we get all of our instances as well:

```
newtype Prob a
= Prob
runProb :: Cofree Perhaps a
{deriving (Functor,Foldable)
}
instance Comonad Prob where
Prob xs) = extract xs
extract (Prob xs) = Prob (fmap Prob (duplicate xs))
duplicate (
foldProb :: (a -> Rational -> b -> b) -> (a -> b) -> Prob a -> b
= r . runProb
foldProb f b where
:< Impossible) = b x
r (x :< WithChance p xs) = f x p (r xs)
r (x
uniform :: [a] -> Prob a
:xs) = Prob (coiterW f (EnvT (length xs) (x :| xs)))
uniform (xwhere
EnvT 0 (_ :| [])) = Impossible
f (EnvT n (_ :| (y:ys)))
f (= WithChance (1 % fromIntegral n) (EnvT (n - 1) (y:|ys))
expect :: (a -> Rational) -> Prob a -> Rational
= foldProb f p
expect p where
= (p x * n + xs) / (n + 1)
f x n xs
probOf :: (a -> Bool) -> Prob a -> Rational
= expect (\x -> if p x then 1 else 0)
probOf p
instance Applicative Prob where
pure x = Prob (x :< Impossible)
<*>) = ap
(
append :: Prob a -> Rational -> Prob a -> Prob a
= foldProb f (\x y -> Prob . (x :<) . WithChance y . runProb)
append where
= Prob . (e :<) . WithChance ip . runProb . a op
f e r a p where
= p * r / (p + r + 1)
ip = p / (r + 1)
op
instance Monad Prob where
>>= f = foldProb (append . f) f xs xs
```

We see here that we’re talking about gambling-style odds, rather than probability. I wonder if the two representations are dual somehow?

The application of comonads to streams (`ListT`

) has
been explored before (Uustalu and Vene 2005); I
wonder if there are any insights to be gleaned from this particular
probability comonad.

# References

*Journal of Functional Programming*16 (1): 21–34. doi:10.1017/S0956796805005721.

*Proceedings of the 2015 ACM SIGPLAN Symposium on Haskell*, 50:165–176. Haskell ’15. New York, NY, USA: ACM. doi:10.1145/2804302.2804317.

*jtobin.io*.

*Proceedings of the Third Asian Conference on Programming Languages and Systems*, 2–18. APLAS’05. Berlin, Heidelberg: Springer-Verlag. doi:10.1007/11575467_2.

Note this is

*not*the same as the`ListT`

in transformers; instead it’s a “ListT done right”.↩︎