Fun with Combinators
There are a bunch of “minimal” computational models out there: Turing machines, lambda calculus, PowerPoint (Wildenhain 2017), etc. These are radically simple languages which are nonetheless Turing complete, so theoretically “as powerful” as each other. Of those, lambda calculus is without question my favourite to actually write programs in: it’s the one which is closest to crawling out of the Turing tarpit.
In terms of implementation, though, it is far from simple. Lambda calculus has variables, which introduce huge complexity into the interpreter: especially if you want to do any kind of formal reasoning about programs, this complexity is a problem. We might want to reach for something even lower-level than lambda calculus: this is where combinator calculi come in.
You may have heard of SKI combinator calculus: it’s the “simplest” of
the calculi, but it’s not actually very easy to understand, and it’s
absolute murder to try use. So we’re going to start with
BCKW
, a more obscure calculus, actually invented by Haskell
Curry.
There are 4 combinators in BCKW
: B
,
C
, K
, and W
(shocking, I know).
You can think about these combinators as functions which manipulate the
beginning of strings:
Bxyz ~> x(yz)
Cxyz ~> xzy
Kxy ~> x
Wxy ~> xyy
Upper case letters are combinators, lower-case are variables. Yes, yes, I know I said that combinator calculi didn’t need variables, and it doesn’t! I’m just using them here to explain how each of the combinators work. If you really want to be pedantic you can think of the lower case letters as notational placeholders meaning “any given combinator”. They won’t exist in any actual programs we write.
Let’s work with some examples to get a sense for how these combinators work.
The simplest combinator is K
: it’s actually equivalent
to the const
function from Haskell. It discards its second
argument, and returns the first. If you give a combinator more arguments
than it usually accepts, you just keep the extra arguments in the
output:
Kxyz ~> xz
W
is the next combinator: it duplicates its
second argument.
Wxy ~> xyy
We always start from the left, applying the rule for the left-most combinator first.
WKxyz ~> Kxxyz ~> xyz
KWxyz ~> Wyz ~> yzz
Next we have C
: this is equivalent to the Haskell
function flip
. It swaps the second and third arguments:
Cxyz ~> xzy
Here’s a small little evaluator for expressions which use
C
, K
, and W
. You can edit the
expression, and press enter to step through it.
The last combinator introduces parentheses, and it’s equivalent to function composition.
Bxyz ~> x(yz)
You can write parentheses yourself: implicitly, all expressions are left-associative. That means that the following are all equal:
xyz = (xy)z = (x)yz = ((x)y)z
But xyz
is not equal to, say,
x(yz)
.
And here’s a puzzle to start flexing your combinator skills: one of
the combinators in SKI combinator calculus is I
, which is
the identity function.
Ix ~> x
Try write an expression which functions the same way as
I
, using only the BCKW
combinators. Use the
following evaluator to try and figure out how to do it: write an
expression after λ>
which functions the same as
I
.
Answer
CK
followed by any combinator will do the trick. So
CKB
, CKK
, CKC
, etc.
I = CKC
Update 19/10/2020: A few people have pointed out (Joachim Breitner was the
first) that there is a shorter solution to this problem:
WK
. I tend to prefer solutions that don’t include
W
, since then we’re working in a subset of the language
that is both terminating and affine; although in this case the reason I
didn’t mention WK
is that I just didn’t find it myself.
Why Not Simpler Combinators?
Each of the combinators we’ve defined so far work a little weird: they seem to skip over their first argument, and work on their second. Indeed, there is another, equivalent combinator calculus which doesn’t have this peculiarity:
Bxyz ~> x(yz)
Axy ~> y
Mx ~> xx
Txy ~> yx
B
stays the same in this calculus, but the rest of the
combinators get switched out for seemingly simpler versions.
K
goes to A
1:
Axy ~> y
Kxy ~> x
Which isn’t a huge change. It’s the other two where we see the real
difference. W
has been swapped out for M
:
Wxy ~> xyy
Mx ~> xx
As you can see W
basically does the same thing as
M
, but while passing through its first argument. The
difference between T
and C
is similar:
Cxyz ~> xzy
Txy ~> yx
So, first of all, it is pretty simple to show that BCKW
contains all of the BAMT
combinators. Try find a way to
write T
using only BCKW
combinators (hint: you
might want to use your previous answer for writing I
using
BCKW
).
Answer
So in fact all of the changed BAMT
combinators can be
encoded using BCKW
by putting I
(or
CKC
or what have you) after the corresponding
BCKW
combinator. In other words:
T = CI = C(CKC)
A = KI = K(CKC)
M = WI = W(CKC)
It’s pretty easy to go from BCKW
to BAMT
,
then. However, it’s extremely difficult to go the other way.
Here, try to write K
in terms of BAMT
(this is
quite difficult, do not expect to get it!):
Answer
Either of the following would work:
B(TA)(BBT)
B(B(TA)B)T
So this is why we will stick to BCKW
for the time being:
BAMT
is just too painful to use.
Linear Types and Combinators
One of the things BCKW
has over SKI
is that
each combinator represents a concrete capability. K
and
W
especially: without these combinators, we can neither
duplicate nor discard variables. This makes the languages without one or
both of these interesting (albeit not Turing-complete).
If we say that we can’t use W
, we know that the will not
duplicate any input. In fact, encoded appropriately, we know that the
program can only decrease its size through execution. The
BCK
system is in fact an encoding of affine logic,
which is all the rage nowadays. Rust uses affine types to guarantee
memory safety: by preventing duplication of references, you can know
that whenever you’re looking at a variable you’re free to modify it, or
destroy it if necessary (obviously Rust is a bit more complex than what
I’ve described here, but BCK
is indeed the fundamental
basis for the system in the same way that SK
can be the
basis for any programming language).
If we remove K
as well we have a linear
language. This is even more restrictive, but is also quite actively
researched at the moment: linear types have been used to construct
languages for differential privacy, for instance.
There’s one small issue with BC
: it doesn’t (strictly
speaking) have an equivalent to I
. You can write an
expression which is close, but it will only actually compute
when applied to at least 3 arguments. See if you can find it.
Answer
BCC
Usually we add I
, though, to give us
BCI
.
The Minimal Combinators: S and K
S
is the only combinator we haven’t seen yet. It’s kind
of a combination of B
, C
, and
W
:
Sxyz ~> xz(yz)
It does parenthesising, reordering, and duplication. This
allows it to be powerful enough to be Turing complete only with the
addition of K
. Try first to construct I
given
only S
and K
:
Answer
SK
followed by any combinator will suffice.
I = SKK = SKS
And now construct S
from BCKW
:
Answer
S = B(BW)(BBC) = B(B(BW)C)(BB)
Of course, to show that SK
is universal we’d need to
show that it contains one of the other universal systems. We won’t do
that exhaustively here, but first just try to figure out B
and W
:
Answer
B = S(KS)K
Answer
W = SS(SK) = SS(KI)
Recursion
The next task is to encode the Y
combinator. This is a
combinator that evaluates to the following:
Yf ~> f(Yf)
As you can see, it encodes recursion. Like the
fix
function in Haskell, this combinator allows us to do
recursion without explicit self-reference. And, of course, we can define
this combinator using the combinators we’ve seen before, since our
language is Turing complete. One encoding is BM(CBM)
:
As you can see, BM(CBM)
, when applied to f
,
yields f(M(CBMf))
, which is equivalent to
f(BM(CBM)f)
(the B
just hasn’t been applied
inside the f
). So this is indeed a proper recursion
combinator.
Encoding Numbers
Let’s try doing a little bit of programming with these combinators now.
In the lambada calculus, to encode numbers we often use the church numerals: that’s what we’re going to do here, too. A church numeral representing some number is a function which takes two arguments, and applies the first argument to the second times. Here are some church numerals in Haskell:
zero :: (a -> a) -> a -> a
zero f x = x
one :: (a -> a) -> a -> a
one f x = f x
two :: (a -> a) -> a -> a
two f x = f (f x)
three :: (a -> a) -> a -> a
three f x = f (f (f x))
Encoding these numerals in combinators is a little more difficult.
Zero and one are obvious: they are A
and I
,
respectively. Try to figure out two and three:
Answer
WB
Answer
SB(WB)
It turns out that it’s pretty easy to encode numbers in a relatively
small amount of space, using a binary encoding. First, multiplication on
Church numerals is simply composition: so that’s B
on our
combinators. We already have 2 defined, so the next thing we need for a
binary encoding is a successor function. And we know what that
is, from the answer to 3!
This means we can encode normal number in space (although it still takes linear time to evaluate). The following repl allows for numbers:
We could take up even less space if we allowed for non-normal forms. 4, for instance, could be encoded like so:
M(WB)
But we generally prefer to keep our encodings in normal form: otherwise there’s some extra evaluation we have to pay for when we go to use them.
Encoding Lambda Terms as Combinators
Once upon a time SKI combinators were used as a target for functional
compilers: Miranda, Haskell’s precursor, compiled down to a set of
combinators which included SKI
. Nowadays, Haskell is
compiled to the “spineless tagless G-machine”: its compilation technique
took over from combinators in the late 80s, and has been the dominant
form since. Apparently the reason is that, on the current architecture
of most computers, combinator-based compilation targets just aren’t fast
enough. They generate too much garbage: as a result, switching to the
STG yielded about a 40% speedup.
A lot of this information comes from two talks, by the way:
- An Introduction to Combinator Compilers and Graph Reduction Machines, by David Graunke (2016), which goes through a high-level history and explanation of combinator compilers and why we switched away from them. A very interesting tidbit in this talk was that some people started making custom hardware to handle combinator calculi a little better. Even more interesting is the fact that these days we have FPGAs all over the place, so maybe combinator compilers are ripe for reintroduction?
- Combinators Revisited, by Edward Kmett (2018), which goes through a little more of the details of the problems with combinator compilers, and mentions some of the places in which we’re tantalisingly close to making combinator compilation work.
So compilation to combinators was once upon a time an extremely active area of research, but it has since fallen by the wayside a little because our current hardware is unable to evaluate it efficiently. What this means for us, though, is that there’s a large body of work on how to compile lambda terms to combinators!
We use the following basic combinator set for compilation:
SKIBC
. S
is really the most important one
here: of course we only need it and K
, but we use
I
because it dramatically simplifies the expressions we
generate, and we use B
and C
because they are
special cases of S
, as we’ll see in a second. The
translation works like so:
\x. e1 e2 -> S (\x. e1) (\x. e2)
\x. x -> I
\x. e -> K e
The translation works bottom-up. We’re only interested in removing the lambdas: combinator calculus does have application, after all, so there’s nothing we need to do in that case. For that reason, the algorithm is often called “abstraction elimination”, and it’s the one the pointfree.io uses to automatically pointfree Haskell expressions.
There are three forms of abstraction: abstraction into an expression
which is an application, abstraction which returns its argument, and
abstraction which returns something other than its argument. In the
first case, we use S
to pass the argument down each branch
of the abstraction. In the second, we just use I
. And in
the third case, we use K
to just ignore the argument. We
won’t ever get \x. \y. e
, since the algorithm works
bottom-up, so the \y. e
is eliminated before looking at the
\x. \y. e
.
B
and C
work like special cases of
S
: when we pass x
down both branches of the
application in the first case, sometimes that work is unnecessary.
Sometimes one of the branches doesn’t use the passed variable: in this
case, we use B
or C
, depending on which branch
ignores the variable.
\x. e1 e2, x ∉ e1 -> B e1 (\x. e2)
\x. e1 e2, x ∉ e2 -> C (\x. e1) e2
There is one issue with this approach: it produces combinator
expressions which are of order
larger than the corresponding lambda expression. With some tricks (like
our usage of C
and B
) we can get that down to
,
but that’s still a pretty unpleasant size increase.
The issue is that we’re basically passing the arguments as a singly-linked list, where naive access is , and more sophisticated access is .
Oleg Kiselyov wrote a paper (2018) on getting this down to , with some memoisation. There’s also a blog post (Lynn 2018), describing how to get that conversion without memoisation in time, and an online implementation here.
Conclusion
That’s all for this post! I’ll probably write more about combinators in the future: they’re an extremely interesting subject, and a lot of fun as puzzles to mess around with. One thing that I haven’t mentioned is the connection between combinators and concatenative languages: it turns out that these two things are pretty much the same thing! Maybe I’ll look at it in a future post.
If you want to look up these combinators elsewhere, this is the only one you won’t be able to find: it’s much less common than
K
, and where I have found it people just call itK
, so I had to pick a different letter to distinguish it↩︎