Revisiting a Trie in Haskell

Posted on September 26, 2016
Part 2 of a 2-part series on tries
Tags: ,

Conforming to Foldable

When I ended the last post, I had a nice Trie datatype, with plenty of functions, but I couldn’t get it to conform to the standard Haskell classes. The problem was to do with the type variables in the Trie:

{-# language GADTs, FlexibleInstances, TypeFamilies #-}
{-# language DeriveFoldable, DeriveFunctor, DeriveTraversable #-}
{-# language FunctionalDependencies, FlexibleInstances #-}

module Tries where

import qualified Data.Map.Strict as Map
import Data.Map.Strict (Map)
import Data.Foldable hiding (toList)
import Prelude hiding (lookup)
import Data.Monoid
import GHC.Exts (IsList(..))
data OldTrie a = OldTrie
  { otEndHere  :: Bool
  , otChildren :: Map a (OldTrie a) }

Although the type variable is a, the trie really contains lists of as. At least, that’s what’s reflected in functions like insert, member, etc.:

member :: (Foldable f, Ord a) => f a -> OldTrie a -> Bool
member = foldr f otEndHere where
  f e a = maybe False a . Map.lookup e . otChildren
  
otInsert :: (Foldable f, Ord a) => f a -> OldTrie a -> OldTrie a
otInsert = foldr f b where
  b (OldTrie _ c) = OldTrie True c
  f e a (OldTrie n c) = OldTrie n (Map.alter (Just . a . fold) e c)
  
instance Ord a => Monoid (OldTrie a) where
  mempty = OldTrie False mempty
  OldTrie v c `mappend` OldTrie t d = 
    OldTrie (v || t) (Map.unionWith (<>) c d)

Realistically, the type which the trie contains is more like:

Foldable f => Trie (f a)

That signature strongly hints at GADTs, as was indicated by this stackoverflow answer. The particular GADT which is applicable here is this:

data TrieSet a where TrieSet :: Bool -> Map a (TrieSet [a]) -> TrieSet [a]
tsEndHere :: TrieSet [a] -> Bool
tsEndHere (TrieSet e _) = e

tsChildren :: TrieSet [a] -> Map a (TrieSet [a])
tsChildren (TrieSet _ c) = c

tsInsert :: (Foldable f, Ord a) => f a -> TrieSet [a] -> TrieSet [a]
tsInsert = foldr f b where
  b :: TrieSet [a] -> TrieSet [a]
  f :: Ord a => a -> (TrieSet [a] -> TrieSet [a]) -> TrieSet [a] -> TrieSet [a]

  b (TrieSet _ c) = TrieSet True c
  f e a (TrieSet n c) = TrieSet n (Map.alter (Just . a . fold) e c)
  
instance Ord a => Monoid (TrieSet [a]) where
  mempty = TrieSet False Map.empty
  TrieSet v c `mappend` TrieSet t d = 
    TrieSet (v || t) (Map.unionWith (<>) c d)

Why lists and not a general Foldable? Well, for the particular use I had in mind (conforming to the Foldable typeclass), I need (:).

instance Foldable TrieSet where
  foldr f b (TrieSet e c) = if e then f [] r else r where
    r = Map.foldrWithKey (flip . g . (:)) b c
    g k = foldr (f . k)

With some more helper functions, the interface becomes pretty nice:

instance Show a => Show (TrieSet [a]) where
  showsPrec d t = 
    showParen 
      (d > 10)
      (showString "fromList " . shows (foldr (:) [] t))

instance Ord a => IsList (TrieSet [a]) where
  type Item (TrieSet [a]) = [a]
  fromList = foldr tsInsert mempty
  toList = foldr (:) []

The trie has the side-effect of lexicographically sorting what it’s given:

:set -XGADTs
fromList ["ced", "abc", "ced", "cb", "ab"] :: TrieSet String
fromList ["ab","abc","cb","ced"]

Further Generalizing

Most implementations of tries that I’ve seen are map-like data structures, rather than set-like. In other words, instead of holding a Bool at the value position, it holds a Maybe something.

data Trie a b = Trie
  { endHere  :: b
  , children :: Map a (Trie a b) 
  } deriving (Eq, Ord, Show, Functor, Foldable, Traversable)

This is a much more straightforward datatype. Foldable can even be automatically derived.

However, I haven’t made the endHere field a Maybe a. I want to be able to write something like this:

type TrieSet [a] = Trie a Bool
type TrieMap a b = Trie a (Maybe b)

And have it automatically choose the implementation of the functions I need1.

To do that, though, I’ll need to write the base functions, agnostic of the type of b. I can rely on something like Monoid, though:

instance (Ord a, Monoid b) => Monoid (Trie a b) where
  mempty = Trie mempty Map.empty
  mappend (Trie v k) (Trie t l) = 
    Trie (v <> t) (Map.unionWith (<>) k l)

In fact, quite a lot of functions naturally lend themselves to this fold + monoid style:

lookup :: (Ord a, Monoid b, Foldable f) 
       => f a -> Trie a b -> b
lookup = foldr f endHere where
  f e a = foldMap a . Map.lookup e . children

insert' :: (Foldable f, Ord a, Monoid b) 
       => f a -> b -> Trie a b -> Trie a b
insert' xs v = foldr f b xs where
  b (Trie p c) = Trie (v <> p) c
  f e a (Trie n c) = 
    Trie n (Map.alter (Just . a . fold) e c) 

A monoid is needed for the values, though, and neither Bool nor ∀ a. Maybe a conform to Monoid. Looking back to the implementation of the trie-set, the (||) function has been replaced by mappend. There is a newtype wrapper in Data.Monoid which has exactly this behaviour, though: Any.

Using that, the type signatures specialize to:

type TrieSet a = Trie a Any
lookup :: (Ord a, Foldable f) 
       => f a -> TrieSet a -> Any
insert :: (Ord a, Foldable f) 
       => f a -> Any -> TrieSet a -> TrieSet a

Similarly, for Maybe, there’s both First and Last. They have the behaviour:

First (Just x) <> First (Just y) == First (Just x)
Last  (Just x) <> Last  (Just y) == Last  (Just y)

I think it makes more sense for a value inserted into a map to overwrite whatever was there before. Since the newer value is on the left in the mappend, then, First makes most sense.

type TrieMap a b = Trie a (First b)
lookup :: (Ord a, Foldable f) => f a -> TrieMap a b -> First b
insert :: (Ord a, Foldable f) 
       => f a -> First b -> TrieMap a b -> TrieMap a b

There are some other ways that you can interpret the monoid. For instance, subbing in Sum Int gives you a bag-like trie:

type TrieBag a = Trie a (Sum Int)
lookup :: (Ord a, Foldable f) => f a -> TrieBag a -> Sum Int
insert :: (Ord a, Foldable f) 
       => f a -> Sum Int -> TrieBag a -> TrieBag a

This is a set which can store multiple copies of each member. Turned the other way around, a map which stores many values for each key looks like this:

type TrieBin a b = Trie a [b]
lookup :: (Ord a, Foldable f) => f a -> TrieBin a b -> [b]
insert :: (Ord a, Foldable f) 
       => f a -> [b] -> TrieBin a b -> TrieBin a b

This method so far isn’t really satisfying, though. Really, the insert signatures should look like this:

insert :: (Ord a, Foldable f) 
       => f a -> b -> TrieMap a b -> TrieMap a b
insert :: (Ord a, Foldable f)
       => f a -> b -> TrieBin a b -> TrieBin a b

Modifying insert slightly, you can get exactly that:

insert :: (Foldable f, Ord a, Applicative c, Monoid (c b)) 
       => f a -> b -> Trie a (c b) -> Trie a (c b)
insert xs v = foldr f b xs where
  b (Trie p c) = Trie (pure v <> p) c
  f e a (Trie n c) = Trie n (Map.alter (Just . a . fold) e c)

pure from Applicative is needed for the “embedding”.

Similarly, the “inserting” for the set-like types isn’t really right. The value argument is out of place. This should be the signature:

add :: (Ord a, Foldable f) 
    => f a -> TrieSet a -> TrieSet a
add :: (Ord a, Foldable f)
    => f a -> TrieBin a -> TrieBin a

In particular, while we have an “empty” thing (0, False) for monoids, we need a “one” thing (1, True) for this function. A semiring2 gives this exact method:

class Monoid a => Semiring a where
  one :: a
  mul :: a -> a -> a
  
instance Num a => Semiring (Sum a) where
  one = 1
  mul = (*)

instance Semiring Any where
  one = Any True
  Any x `mul` Any y = Any (x && y)

This class is kind of like a combination of both monoid wrappers for both Int and Bool. You could take advantage of that:


class (Monoid add, Monoid mult)
  => SemiringIso a add mult | a -> add, a -> mult where
    toAdd    :: a -> add
    fromAdd  :: add -> a
    toMult   :: a -> mult
    fromMult :: mult -> a
  
(<+>), (<.>) :: SemiringIso a add mult => a -> a -> a

x <+> y = fromAdd  (toAdd  x <> toAdd  y)
x <.> y = fromMult (toMult x <> toMult y)

instance SemiringIso Int (Sum Int) (Product Int) where
  toAdd    = Sum
  fromAdd  = getSum
  toMult   = Product
  fromMult = getProduct

instance SemiringIso Bool Any All where
  toAdd    = Any
  fromAdd  = getAny
  toMult   = All
  fromMult = getAll

But it seems like overkill.

Anyway, assuming that we have the functions from Semiring, here’s the add function:

add :: (Foldable f, Ord a, Semiring b) 
    => f a -> Trie a b -> Trie a b
add xs = foldr f b xs where
  b (Trie p c) = Trie (one <> p) c
  f e a (Trie n c) = 
    Trie n (Map.alter (Just . a . fold) e c)

Now, expressions can be built up without specifying the specific monoid implementation, and the whole behaviour can be changed with a type signature:

instance (Ord a, Semiring b) => IsList (Trie a b) where
  type Item (Trie a b) = [a]
  fromList = foldr add mempty
  toList = undefined
ans :: Semiring b => b
ans = lookup "abc" (fromList ["abc", "def", "abc", "ghi"])
ans :: Sum Int
Sum {getSum = 2}
ans :: Any
Any {getAny = True}

Slightly fuller implementations of all of these are available here.


  1. Kind of like program inference in lieu of type inference↩︎

  2. This isn’t really a very good definition of semiring. While Haskell doesn’t have this class in base, Purescript has it in their prelude.↩︎