Revisiting a Trie in Haskell
Conforming to Foldable
When I ended the last post, I had a nice Trie datatype,
with plenty of functions, but I couldn’t get it to conform to the
standard Haskell classes. The problem was to do with the type variables
in the Trie:
data OldTrie a = OldTrie
{ otEndHere :: Bool
, otChildren :: Map a (OldTrie a) }Although the type variable is a, the trie really contains
lists of as. At least,
that’s what’s reflected in functions like insert, member, etc.:
member :: (Foldable f, Ord a) => f a -> OldTrie a -> Bool
member = foldr f otEndHere where
f e a = maybe False a . Map.lookup e . otChildren
otInsert :: (Foldable f, Ord a) => f a -> OldTrie a -> OldTrie a
otInsert = foldr f b where
b (OldTrie _ c) = OldTrie True c
f e a (OldTrie n c) = OldTrie n (Map.alter (Just . a . fold) e c)
instance Ord a => Monoid (OldTrie a) where
mempty = OldTrie False mempty
OldTrie v c `mappend` OldTrie t d =
OldTrie (v || t) (Map.unionWith (<>) c d)Realistically, the type which the trie contains is more like:
Foldable f => Trie (f a)That signature strongly hints at GADTs, as was indicated by this stackoverflow answer. The particular GADT which is applicable here is this:
data TrieSet a where TrieSet :: Bool -> Map a (TrieSet [a]) -> TrieSet [a]Why lists and not a general Foldable?
Well, for the particular use I had in mind (conforming to the Foldable
typeclass), I need (:).
instance Foldable TrieSet where
foldr f b (TrieSet e c) = if e then f [] r else r where
r = Map.foldrWithKey (flip . g . (:)) b c
g k = foldr (f . k)With some more helper functions, the interface becomes pretty nice:
instance Show a => Show (TrieSet [a]) where
showsPrec d t =
showParen
(d > 10)
(showString "fromList " . shows (foldr (:) [] t))
instance Ord a => IsList (TrieSet [a]) where
type Item (TrieSet [a]) = [a]
fromList = foldr tsInsert mempty
toList = foldr (:) []The trie has the side-effect of lexicographically sorting what it’s given:
fromList ["ced", "abc", "ced", "cb", "ab"] :: TrieSet String
fromList ["ab","abc","cb","ced"]Further Generalizing
Most implementations of tries that I’ve seen are map-like data
structures, rather than set-like. In other words, instead of holding a
Bool at
the value position, it holds a Maybe
something.
data Trie a b = Trie
{ endHere :: b
, children :: Map a (Trie a b)
} deriving (Eq, Ord, Show, Functor, Foldable, Traversable)This is a much more straightforward datatype. Foldable can
even be automatically derived.
However, I haven’t made the endHere field a Maybe a. I
want to be able to write something like this:
type TrieSet [a] = Trie a Bool
type TrieMap a b = Trie a (Maybe b)And have it automatically choose the implementation of the functions I need1.
To do that, though, I’ll need to write the base functions, agnostic
of the type of b. I can rely on something like
Monoid,
though:
instance (Ord a, Monoid b) => Monoid (Trie a b) where
mempty = Trie mempty Map.empty
mappend (Trie v k) (Trie t l) =
Trie (v <> t) (Map.unionWith (<>) k l)In fact, quite a lot of functions naturally lend themselves to this fold + monoid style:
lookup :: (Ord a, Monoid b, Foldable f)
=> f a -> Trie a b -> b
lookup = foldr f endHere where
f e a = foldMap a . Map.lookup e . children
insert' :: (Foldable f, Ord a, Monoid b)
=> f a -> b -> Trie a b -> Trie a b
insert' xs v = foldr f b xs where
b (Trie p c) = Trie (v <> p) c
f e a (Trie n c) =
Trie n (Map.alter (Just . a . fold) e c) A monoid is needed for the values, though, and neither Bool nor ∀ a. Maybe a
conform to Monoid.
Looking back to the implementation of the trie-set, the (||) function
has been replaced by mappend. There
is a newtype wrapper in Data.Monoid
which has exactly this behaviour, though: Any.
Using that, the type signatures specialize to:
type TrieSet a = Trie a Any
lookup :: (Ord a, Foldable f)
=> f a -> TrieSet a -> Any
insert :: (Ord a, Foldable f)
=> f a -> Any -> TrieSet a -> TrieSet aSimilarly, for Maybe, there’s
both First and
Last.
They have the behaviour:
First (Just x) <> First (Just y) == First (Just x)Last (Just x) <> Last (Just y) == Last (Just y)I think it makes more sense for a value inserted into a map to
overwrite whatever was there before. Since the newer value is on the
left in the mappend, then,
First
makes most sense.
type TrieMap a b = Trie a (First b)
lookup :: (Ord a, Foldable f) => f a -> TrieMap a b -> First b
insert :: (Ord a, Foldable f)
=> f a -> First b -> TrieMap a b -> TrieMap a bThere are some other ways that you can interpret the monoid. For
instance, subbing in Sum Int
gives you a bag-like trie:
type TrieBag a = Trie a (Sum Int)
lookup :: (Ord a, Foldable f) => f a -> TrieBag a -> Sum Int
insert :: (Ord a, Foldable f)
=> f a -> Sum Int -> TrieBag a -> TrieBag aThis is a set which can store multiple copies of each member. Turned the other way around, a map which stores many values for each key looks like this:
type TrieBin a b = Trie a [b]
lookup :: (Ord a, Foldable f) => f a -> TrieBin a b -> [b]
insert :: (Ord a, Foldable f)
=> f a -> [b] -> TrieBin a b -> TrieBin a bThis method so far isn’t really satisfying, though. Really, the insert signatures should look like
this:
insert :: (Ord a, Foldable f)
=> f a -> b -> TrieMap a b -> TrieMap a b
insert :: (Ord a, Foldable f)
=> f a -> b -> TrieBin a b -> TrieBin a bModifying insert slightly, you can get exactly that:
insert :: (Foldable f, Ord a, Applicative c, Monoid (c b))
=> f a -> b -> Trie a (c b) -> Trie a (c b)
insert xs v = foldr f b xs where
b (Trie p c) = Trie (pure v <> p) c
f e a (Trie n c) = Trie n (Map.alter (Just . a . fold) e c)pure
from Applicative is
needed for the “embedding”.
Similarly, the “inserting” for the set-like types isn’t really right. The value argument is out of place. This should be the signature:
add :: (Ord a, Foldable f)
=> f a -> TrieSet a -> TrieSet a
add :: (Ord a, Foldable f)
=> f a -> TrieBin a -> TrieBin aIn particular, while we have an “empty” thing (0, False) for monoids, we need a “one” thing (1, True) for this function. A semiring2 gives this exact method:
class Monoid a => Semiring a where
one :: a
mul :: a -> a -> a
instance Num a => Semiring (Sum a) where
one = 1
mul = (*)
instance Semiring Any where
one = Any True
Any x `mul` Any y = Any (x && y)This class is kind of like a combination of both monoid wrappers for
both Int
and Bool. You
could take advantage of that:
class (Monoid add, Monoid mult)
=> SemiringIso a add mult | a -> add, a -> mult where
toAdd :: a -> add
fromAdd :: add -> a
toMult :: a -> mult
fromMult :: mult -> a
(<+>), (<.>) :: SemiringIso a add mult => a -> a -> a
x <+> y = fromAdd (toAdd x <> toAdd y)
x <.> y = fromMult (toMult x <> toMult y)
instance SemiringIso Int (Sum Int) (Product Int) where
toAdd = Sum
fromAdd = getSum
toMult = Product
fromMult = getProduct
instance SemiringIso Bool Any All where
toAdd = Any
fromAdd = getAny
toMult = All
fromMult = getAllBut it seems like overkill.
Anyway, assuming that we have the functions from Semiring,
here’s the add function:
add :: (Foldable f, Ord a, Semiring b)
=> f a -> Trie a b -> Trie a b
add xs = foldr f b xs where
b (Trie p c) = Trie (one <> p) c
f e a (Trie n c) =
Trie n (Map.alter (Just . a . fold) e c)Now, expressions can be built up without specifying the specific monoid implementation, and the whole behaviour can be changed with a type signature:
ans = lookup "abc" (fromList ["abc", "def", "abc", "ghi"])ans :: Sum Int
Sum {getSum = 2}ans :: Any
Any {getAny = True}Slightly fuller implementations of all of these are available here.
Kind of like program inference in lieu of type inference↩︎
This isn’t really a very good definition of semiring. While Haskell doesn’t have this class in base, Purescript has it in their prelude.↩︎