## A Trie in Swift

If you google “cool data structures” you’ll get this as your first result. It’s a stackoverflow question: “What are the lesser known but useful data structures?”. And the top answer is a Trie. I read up on them, and found out a lot of cool things about their use (as well as finding out that I’m now the kind of person who googles “cool data structures”). So I rocked on up to my playground, and got writing.

A Trie is a prefix tree. It’s another recursive data structure: each Trie contains other children Tries, identifiable by their prefixes.

It’s a bit of a hipster data structure, not very widely used, but
it’s got some useful applications. It’s got set-like operations, with
insertion and searching each at
$O(n)$,
where
$n$
is the length of the sequence being searched for. A Set is the only way
to go for hashable, unordered elements. But, if you’ve got
*sequences* of hashable elements, a Trie might be for you. (one
thing to note is that Sets are hashable themselves, so if the sequences
you want to store are unordered, a Set of Sets is more applicable)

In Swift, we can do this by having every Trie contain a dictionary of prefixes and Tries. Something like this:

```
<Element : Hashable> {
public struct Trieprivate var children: [Element:Trie<Element>]
}
```

We don’t run into the problem of structs not being allowed to be
recursive here, because we don’t directly store a Trie within a Trie -
we store a *dictionary*, and therefore a reference to the child
Tries. In this dictionary, the keys correspond to the prefixes. So how
do we fill it up? Like lists, we can use the decomposition properties of
generators:

```
{
extension Trie private init<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
= [head:Trie(gen:gen)]
children } else {
= [:]
children }
}
public init<S : SequenceType where S.Generator.Element == Element>
(_ seq: S) {
.init(gen: seq.generate())
self}
}
```

That’s not really enough. That can store one sequence, but we need an
`insert`

function. Here ya go:

```
{
extension Trie private mutating func insert
<G : GeneratorType where G.Element == Element>
(var gen: G) {
if let head = gen.next() {
[head]?.insert(gen) ?? {children[head] = Trie(gen: gen)}()
children}
}
public mutating func insert<S : SequenceType where S.Generator.Element == Element>
(seq: S) {
insert(seq.generate())
}
}
```

There’s a line in there that some may find offensive:

`[head]?.insert(gen) ?? {children[head] = Trie(gen: gen)}() children`

And, to be honest, I’m not a huge fan of it myself. It’s making use of the fact that you can call mutating methods on optionals with chaining. When you do it in this example, the optional is returned by the dictionary lookup: we then want to mutate that value, if it’s there, with an insertion.

If it’s *not* there, though, we want to add it in, so we’ve
got to have some way of understanding and dealing with that. We could
try and extract the child Trie, like this:

```
if let head = gen.next() {
if var child = children[head] {
.insert(gen)
child} else {
[head] = Trie(gen: gen)
children}
}
```

But the child there is just a copy of the actual child in the Trie we want to mutate. We could then set it back to the dictionary entry - but at this stage it feels like a lot of extra, inefficient work.

So, you can make use of the fact the functions which don’t return
anything actually *do* return something: a special value called
`Void`

, or
`()`

. Except
that, in this case, it’s `()?`

(or `Optional<Void>`

).
We’re not interested in the void itself, obviously, just whether or not
it’s `nil`

. So, one way you could
use it would be like this:

```
if let _ = children[head]?.insert(gen) { return }
[head] = Trie(gen: gen) children
```

Or, to use `guard`

:

`= children[head]?.insert(gen) else { children[head] = Trie(gen: gen) } guard let _ `

But I think the nil coalescing operator is a little clearer, without
the distraction of `let`

or `_`

.

This data structure, as you can see, has a very different feel to the
list. For a start, it’s much more mutable, with in-place mutating
methods being a little easier than methods that return a new Trie. Also,
laziness is pretty much out of the question: almost every imaginable
useful method would involve evaluation of the entire Trie. (if anyone
*does* have a useful way of thinking about Tries lazily, I’d love
to hear it)

The contains function, the most important of them all, is here:

```
{
extension Trie private func contains
<G : GeneratorType where G.Element == Element>
(var gen: G) -> Bool {
return gen.next().map{self.children[$0]?.contains(gen) ?? false} ?? true
}
public func contains<S : SequenceType where S.Generator.Element == Element>
(seq: S) -> Bool {
return contains(seq.generate())
}
}
```

So this uses more generators. If the generator is empty (`gen.next()`

returns `nil`

), then the Trie
contains that sequence, as we have not yet found a dictionary without
that element. Within the `map()`

we search for the next element from the generator. If *that*
returns `nil`

, then the Trie doesn’t
contain that sequence. Finally, if none of that works, return whether or
not the child Trie contains the rest of the generator. Let’s try it
out:

```
var jo = Trie([1, 2, 3])
.insert([4, 5, 6])
jo.insert([7, 8, 9])
jo
.contains([4, 5, 6]) // true
jo.contains([2, 1, 3]) // false jo
```

There’s a catch. The `contains`

method doesn’t work as we’d like it to:

`.contains([1, 2]) // true jo`

Because we return `true`

*whenever* the generator runs out, our Trie “contains” every
prefix of the sequences that have been inserted. This is not what we
want. One way to solve this may be to return `true`

only if the
last Trie found has no children. Something like this:

```
{
extension Trie private func contains
<G : GeneratorType where G.Element == Element>
(var gen: G) -> Bool {
return gen.next().map{self.children[$0]?.contains(gen) ?? false} ?? children.isEmpty
}
}
```

But this doesn’t really work either. what if we did `jo.insert([1, 2])`

?
Now, if we check if the Trie contains `[1, 2]`

,
we’ll get back `false`

.

It’s time for flags. We need to add an extra variable to our Trie: a Boolean, which describes whether or not that Trie represents the end of a sequence.

```
<Element : Hashable> {
public struct Trieprivate var children: [Element:Trie<Element>]
private var endHere : Bool
}
```

We’ll also need to change our `insert`

and `init`

functions, so that when the
generator returns `nil`

, `endHere`

gets initialised to `true`

.

```
{
extension Trie private init<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
(children, endHere) = ([head:Trie(gen:gen)], false)
} else {
(children, endHere) = ([:], true)
}
}
}
{
extension Trie private mutating func insert
<G : GeneratorType where G.Element == Element>
(var gen: G) {
if let head = gen.next() {
[head]?.insert(gen) ?? {children[head] = Trie(gen: gen)}()
children} else {
= true
endHere }
}
}
```

And the contains function now returns `endHere`

, instead of true:

```
{
public extension Trie private func contains
<G : GeneratorType where G.Element == Element>
(var gen: G) -> Bool {
return gen.next().map{self.children[$0]?.contains(gen) ?? false} ?? endHere
}
}
```

While we’re improving the `contains`

function, we could use `guard`

to make it much more readable:

```
{
public extension Trie private func contains<
: GeneratorType where G.Element == Element
G >(var gen: G) -> Bool {
= gen.next() else { return endHere }
guard let head return children[head]?.contains(gen) ?? false
}
}
```

Chris Eidhof gave me this idea. (Apparently there’s a Trie implementation in Functional Programming in Swift, his book. I’ve not read it, but it’s on my list. If Advanced Swiftis anything to go by, it should be fantastic.)

The objective of this Trie is to replicate all of the Set methods:
Union, Intersect, etc. Most of those are manageable to build from just
`insert`

, `init`

, and `contains`

, but there’s one other function
that comes in handy: `remove`

.

Remove is deceptively difficult. You could just walk to the end of
your given sequence to remove, and switch `endHere`

from `true`

to `false`

, but
that’s kind of cheating. I mean, you’ll be storing the same amount of
information that way after a removal. No, what you need is something
that deletes branches of a tree that aren’t being used any more.

Again, this is a little complicated. You can’t just find the head of
the sequence you want to remove, and then delete all children: you may
be deleting other entries along with that. You *also* can’t just
delete when a given Trie only contains one child: that child may branch
off subsequently, or it may contain prefixes for the sequence you want
to remove.

Crucially, all of the information telling you whether or not you can
delete a given entry in a given Trie will come from the
*children* of that Trie. What I decided to go with was this: I’ll
have some mutating method that does the work recursively. However, this
method also *returns* a value, representing some important
information for whatever called it. In this case, the `remove`

method would remove, as you’d
imagine, but it will also return a Boolean, signifying whether the Trie
it was called on can be removed. Since I used the normal structure of
having a private method take a generator, and then a public wrapper
method take a sequence, I could have the public method just discard the
Boolean.

Let’s go through it. Here’s the signature:

```
private mutating func remove<
: GeneratorType where G.Element == Element
G >(var g: G) -> Bool {
```

No surprises there. Similar to the other methods. Then, get the head from the generator:

`if let head = g.next() {`

Within that if block is the meat of the logic, so I might skip to
what happens if `g.next()`

returns `nil`

for the start:

```
private mutating func remove<
: GeneratorType where G.Element == Element
G >(var g: G) -> Bool {
if let head = g.next() {...}
= false
endHere return children.isEmpty
}
```

So the sequence being removed has ended. That means that whatever
Trie you’re on should have its `endHere`

set to `false`

. To the
user of the Trie, that’s all that matters: from now on, if the contains
method on that Trie is used with that sequence, it will return
false.

However, to find out if you can delete the data itself, it returns
`children.isEmpty`

.
If it has no children, it does not hold any other sequences or
information, so it can be deleted.

Now for inside the if block:

```
[head]?.remove(g) == true else { return false }
guard children.removeValueForKey(head)
childrenreturn !endHere && children.isEmpty
```

So it calls `remove`

on the child
Trie corresponding to `head`

. That
guard statement will fail for two distinct reasons: if `children`

doesn’t contain `head`

, then the sequence being removed
wasn’t in the Trie in the first place. The method will then return
false, so that no removal or mutation is done.

If it *does* contain `head`

, but the Bool returned from the
remove method is `false`

, that
means that its *child* is not removable, so it is also not
removable, so it should return `false`

.

Otherwise, it will remove that member (`children.removeValueForKey(head)`

).
Then, the Trie can decide whether or not it itself is removable: `return !endHere && children.isEmpty`

.
If the `endHere`

is set to true,
then it is the end of some sequence: it is not removable. Otherwise,
it’s removable if it has no children. Here’s the whole thing, with its
public version:

```
{
extension Trie private mutating func remove<
: GeneratorType where G.Element == Element
G >(var g: G) -> Bool { // Return value signifies whether or not it can be removed
if let head = g.next() {
[head]?.remove(g) == true else { return false }
guard children.removeValueForKey(head)
childrenreturn !endHere && children.isEmpty
}
= false
endHere return children.isEmpty
}
<
public mutating func remove: SequenceType where S.Generator.Element == Element
S >(seq: S) {
remove(seq.generate())
}
}
```

That was a little heavy. And kind of ugly. Let’s lighten things up
for a second, with one of the loveliest `count`

properties I’ve seen:

```
{
extension Trie var count: Int {
public return children.values.reduce(endHere ? 1 : 0) { $0 + $1.count }
}
}
```

All it’s really doing is counting the instances of a `true`

`endHere`

. If the current Trie is an end,
then it knows that it adds one to the count (`endHere ? 1 : 0`

),
and it adds that to the sum of the counts of its children.

Now then. `SequenceType`

. Getting
tree-like structures to conform to `SequenceType`

is a bit of a pain,
mainly because of their recursiveness. Getting a linear representation
is easy enough:

```
{
extension Trie var contents: [[Element]] {
public return children.flatMap {
(head: Element, child: Trie<Element>) -> [[Element]] in
.contents.map { [head] + $0 } + (child.endHere ? [[head]] : [])
child}
}
}
```

And then you could just return the generate method from that for your Trie’s generate method.

The problem is that it’s not very proper: you’re translating your data structure into another data structure just to iterate through it. What you really want is something that generates each element on demand.

But it gets ugly quick. You’ve got to do a lot of stuff by hand which
it isn’t nice to do by hand, and you’ve got to employ some dirty tricks
(like using closures as a kind of homemade `indirect`

). At any rate, here it is:

```
<Element : Hashable> : GeneratorType {
public struct TrieGeneratorprivate var children: DictionaryGenerator<Element, Trie<Element>>
private var curHead : Element?
private var curEnd : Bool = false
private var innerGen: (() -> [Element]?)?
private mutating func update() {
let (head, child) = children.next() else { innerGen = nil; return }
guard = head
curHead var g = child.generate()
= {g.next()}
innerGen = child.endHere
curEnd }
next() -> [Element]? {
public mutating func for ; innerGen != nil; update() {
if let next = innerGen!() {
return [curHead!] + next
} else if curEnd {
= false
curEnd return [curHead!]
}
}
return nil
}
private init(_ from: Trie<Element>) {
= from.children.generate()
children update()
}
}
```

It’s got a similar logic to the lazy flatMap I did from a while ago.

The code is all available here, as a playground, or here, in SwiftSequence, where it’s accompanied by some tests.