## Fun with Combinators

Posted on October 17, 2020
Tags: Combinators

There are a bunch of “minimal” computational models out there: Turing machines, lambda calculus, PowerPoint (Wildenhain 2017), etc. These are radically simple languages which are nonetheless Turing complete, so theoretically “as powerful” as each other. Of those, lambda calculus is without question my favourite to actually write programs in: it’s the one which is closest to crawling out of the Turing tarpit.

In terms of implementation, though, it is far from simple. Lambda calculus has variables, which introduce huge complexity into the interpreter: especially if you want to do any kind of formal reasoning about programs, this complexity is a problem. We might want to reach for something even lower-level than lambda calculus: this is where combinator calculi come in.

You may have heard of SKI combinator calculus: it’s the “simplest” of the calculi, but it’s not actually very easy to understand, and it’s absolute murder to try use. So we’re going to start with BCKW, a more obscure calculus, actually invented by Haskell Curry.

There are 4 combinators in BCKW: B, C, K, and W (shocking, I know). You can think about these combinators as functions which manipulate the beginning of strings:

Bxyz ~> x(yz)
Cxyz ~> xzy
Kxy  ~> x
Wxy  ~> xyy

Let’s work with some examples to get a sense for how these combinators work.

Upper case letters are combinators, lower-case are variables. Yes, yes, I know I said that combinator calculi didn’t need variables, and it doesn’t! I’m just using them here to explain how each of the combinators work. If you really want to be pedantic you can think of the lower case letters as notational placeholders meaning “any given combinator”. They won’t exist in any actual programs we write.

The simplest combinator is K: it’s actually equivalent to the const function from Haskell. It discards its second argument, and returns the first. If you give a combinator more arguments than it usually accepts, you just keep the extra arguments in the output:

Kxyz ~> xz

W is the next combinator: it duplicates its second argument.

Wxy ~> xyy

We always start from the left, applying the rule for the left-most combinator first.

WKxyz ~> Kxxyz ~> xyz
KWxyz ~> Wyz   ~> yzz

Next we have C: this is equivalent to the Haskell function flip. It swaps the second and third arguments:

Cxyz ~> xzy

Here’s a small little evaluator for expressions which use C, K, and W. You can edit the expression, and press enter to step through it.

The last combinator introduces parentheses, and it’s equivalent to function composition.

Bxyz ~> x(yz)

You can write parentheses yourself: implicitly, all expressions are left-associative. That means that the following are all equal:

xyz = (xy)z = (x)yz = ((x)y)z

But xyz is not equal to, say, x(yz).

And here’s a puzzle to start flexing your combinator skills: one of the combinators in SKI combinator calculus is I, which is the identity function.

Ix ~> x

Try write an expression which functions the same way as I, using only the BCKW combinators. Use the following evaluator to try and figure out how to do it: write an expression after λ> which functions the same as I.

Answer CK followed by any combinator will do the trick. So CKB, CKK, CKC, etc.

I = CKC
Update 19/10/2020: A few people have pointed out (Joachim Breitner was the first) that there is a shorter solution to this problem: WK. I tend to prefer solutions that don’t include W, since then we’re working in a subset of the language that is both terminating and affine; although in this case the reason I didn’t mention WK is that I just didn’t find it myself.

# Why Not Simpler Combinators?

Each of the combinators we’ve defined so far work a little weird: they seem to skip over their first argument, and work on their second. Indeed, there is another, equivalent combinator calculus which doesn’t have this peculiarity:

Bxyz ~> x(yz)
Axy  ~> y
Mx   ~> xx
Txy  ~> yx

B stays the same in this calculus, but the rest of the combinators get switched out for seemingly simpler versions. K goes to A1:

Axy ~> y
Kxy ~> x

Which isn’t a huge change. It’s the other two where we see the real difference. W has been swapped out for M:

Wxy ~> xyy
Mx  ~> xx

As you can see W basically does the same thing as M, but while passing through its first argument. The difference between T and C is similar:

Cxyz ~> xzy
Txy  ~> yx

So, first of all, it is pretty simple to show that BCKW contains all of the BAMT combinators. Try find a way to write T using only BCKW combinators (hint: you might want to use your previous answer for writing I using BCKW).

Answer So in fact all of the changed BAMT combinators can be encoded using BCKW by putting I (or CKC or what have you) after the corresponding BCKW combinator. In other words:

T = CI = C(CKC)
A = KI = K(CKC)
M = WI = W(CKC)

It’s pretty easy to go from BCKW to BAMT, then. However, it’s extremely difficult to go the other way. Here, try to write K in terms of BAMT (this is quite difficult, do not expect to get it!):

Answer Either of the following would work:

B(TA)(BBT)
B(B(TA)B)T

So this is why we will stick to BCKW for the time being: BAMT is just too painful to use.

# Linear Types and Combinators

One of the things BCKW has over SKI is that each combinator represents a concrete capability. K and W especially: without these combinators, we can neither duplicate nor discard variables. This makes the languages without one or both of these interesting (albeit not Turing-complete).

If we say that we can’t use W, we know that the will not duplicate any input. In fact, encoded appropriately, we know that the program can only decrease its size through execution. The BCK system is in fact an encoding of affine logic, which is all the rage nowadays. Rust uses affine types to guarantee memory safety: by preventing duplication of references, you can know that whenever you’re looking at a variable you’re free to modify it, or destroy it if necessary (obviously Rust is a bit more complex than what I’ve described here, but BCK is indeed the fundamental basis for the system in the same way that SK can be the basis for any programming language).

If we remove K as well we have a linear language. This is even more restrictive, but is also quite actively researched at the moment: linear types have been used to construct languages for differential privacy, for instance.

There’s one small issue with BC: it doesn’t (strictly speaking) have an equivalent to I. You can write an expression which is close, but it will only actually compute when applied to at least 3 arguments. See if you can find it.

BCC

Usually we add I, though, to give us BCI.

# The Minimal Combinators: S and K

S is the only combinator we haven’t seen yet. It’s kind of a combination of B, C, and W:

Sxyz ~> xz(yz)

It does parenthesising, reordering, and duplication. This allows it to be powerful to be Turing complete only with the addition of K. Try first to construct I given only S and K:

Answer SK followed by any combinator will suffice.

I = SKK = SKS

And now construct S from BCKW:

S = B(BW)(BBC) = B(B(BW)C)(BB)

Of course, to show that SK is universal we’d need to show that it contains one of the other universal systems. We won’t do that exhaustively here, but first just try to figure out B and W:

B = S(KS)K

S = SS(SK) = SS(KI)

# Recursion

The next task is to encode the Y combinator. This is a combinator that evaluates to the following:

Yf ~> f(Yf)

As you can see, it encodes recursion. Like the fix function in Haskell, this combinator allows us to do recursion without explicit self-reference. And, of course, we can define this combinator using the combinators we’ve seen before, since our language is Turing complete. One encoding is BM(CBM):

As you can see, BM(CBM), when applied to f, yields f(M(CBMf)), which is equivalent to f(BM(CBM)f) (the B just hasn’t been applied inside the f). So this is indeed a proper recursion combinator.

# Encoding Numbers

Let’s try doing a little bit of programming with these combinators now.

In the lambada calculus, to encode numbers we often use the church numerals: that’s what we’re going to do here, too. A church numeral representing some number $n$ is a function which takes two arguments, and applies the first argument to the second $n$ times. Here are some church numerals in Haskell:

zero :: (a -> a) -> a -> a
zero f x = x

one :: (a -> a) -> a -> a
one f x = f x

two :: (a -> a) -> a -> a
two f x = f (f x)

three :: (a -> a) -> a -> a
three f x = f (f (f x))

Encoding these numerals in combinators is a little more difficult. Zero and one are obvious: they are A and I, respectively. Try to figure out two and three:

AnswerWB

AnswerSB(WB)

It turns out that it’s pretty easy to encode numbers in a relatively small amount of space, using a binary encoding. First, multiplication on Church numerals is simply composition: so that’s B on our combinators. We already have 2 defined, so the next thing we need for a binary encoding is a successor function. And we know what that is, from the answer to 3!

This means we can encode normal number in $\mathcal{O}(\log n)$ space (although it still takes linear time to evaluate). The following repl allows for numbers:

We could take up even less space if we allowed for non-normal forms. 4, for instance, could be encoded like so:

M(WB)

But we generally prefer to keep our encodings in normal form: otherwise there’s some extra evaluation we have to pay for when we go to use them.

# Encoding Lambda Terms as Combinators

Once upon a time SKI combinators were used as a target for functional compilers: Miranda, Haskell’s precursor, compiled down to a set of combinators which included SKI. Nowadays, Haskell is compiled to the “spineless tagless G-machine”: its compilation technique took over from combinators in the late 80s, and has been the dominant form since. Apparently the reason is that, on the current architecture of most computers, combinator-based compilation targets just aren’t fast enough. They generate too much garbage: as a result, switching to the STG yielded about a 40% speedup.

A lot of this information comes from two talks, by the way:

• An Introduction to Combinator Compilers and Graph Reduction Machines, by David Graunke (2016), which goes through a high-level history and explanation of combinator compilers and why we switched away from them. A very interesting tidbit in this talk was that some people started making custom hardware to handle combinator calculi a little better. Even more interesting is the fact that these days we have FPGAs all over the place, so maybe combinator compilers are ripe for reintroduction?
• Combinators Revisited, by Edward Kmett (2018), which goes through a little more of the details of the problems with combinator compilers, and mentions some of the places in which we’re tantalisingly close to making combinator compilation work.

So compilation to combinators was once upon a time an extremely active area of research, but it has since fallen by the wayside a little because our current hardware is unable to evaluate it efficiently. What this means for us, though, is that there’s a large body of work on how to compile lambda terms to combinators!

We use the following basic combinator set for compilation: SKIBC. S is really the most important one here: of course we only need it and K, but we use I because it dramatically simplifies the expressions we generate, and we use B and C because they are special cases of S, as we’ll see in a second. The translation works like so:

\x. e1 e2 -> S (\x. e1) (\x. e2)
\x. x     -> I
\x. e     -> K e

The translation works bottom-up. We’re only interested in removing the lambdas: combinator calculus does have application, after all, so there’s nothing we need to do in that case. For that reason, the algorithm is often called “abstraction elimination”, and it’s the one the pointfree.io uses to automatically pointfree Haskell expressions.

There are three forms of abstraction: abstraction into an expression which is an application, abstraction which returns its argument, and abstraction which returns something other than its argument. In the first case, we use S to pass the argument down each branch of the abstraction. In the second, we just use I. And in the third case, we use K to just ignore the argument. We won’t ever get \x. \y. e, since the algorithm works bottom-up, so the \y. e is eliminated before looking at the \x. \y. e.

B and C work like special cases of S: when we pass x down both branches of the application in the first case, sometimes that work is unnecessary. Sometimes one of the branches doesn’t use the passed variable: in this case, we use B or C, depending on which branch ignores the variable.

\x. e1 e2, x ∉ e1 -> B e1 (\x. e2)
\x. e1 e2, x ∉ e2 -> C (\x. e1) e2

There is one issue with this approach: it produces combinator expressions which are of order $\mathcal{O}(n^3)$ larger than the corresponding lambda expression. With some tricks (like our usage of C and B) we can get that down to $\mathcal{O}(n^2)$, but that’s still a pretty unpleasant size increase.

The issue is that we’re basically passing the arguments as a singly-linked list, where naive access is $\mathcal{O(n^2)}$, and more sophisticated access is $\mathcal{O}(n)$.

Oleg Kiselyov wrote a paper (2018) on getting this down to $\mathcal{O}(n)$, with some memoisation. There’s also a blog post (Lynn 2018), describing how to get that conversion without memoisation in $\mathcal{O}(n \log n)$ time, and an online implementation here.

# Conclusion

That’s all for this post! I’ll probably write more about combinators in the future: they’re an extremely interesting subject, and a lot of fun as puzzles to mess around with. One thing that I haven’t mentioned is the connection between combinators and concatenative languages: it turns out that these two things are pretty much the same thing! Maybe I’ll look at it in a future post.

Graunke, David. 2016. “An Introduction to Combinator Compilers and Graph Reduction Machines.” St. Louis. https://www.youtube.com/watch?v=GawiQQCn3bk.

Kiselyov, Oleg. 2018. “$\lambda$ to SKI, Semantically.” In Functional and Logic Programming, ed by. John P. Gallagher and Martin Sulzmann, 33–50. Lecture Notes in Computer Science. Cham: Springer International Publishing. doi:10.1007/978-3-319-90686-7_3. http://okmij.org/ftp/tagless-final/ski.pdf.

Kmett, Edward. 2018. “Combinators Revisited.” Wesley Conference Centre, Sydney, Australia. https://yowconference.com/talks/edward-kmett/yow-lambda-jam-2018/combinators-revisited-5919.

Lynn, Ben. 2018. “Ben Lynn’s Online Garbage: Lambda the Penultimate.” Ben Lynn’s Online Garbage. https://benlynn.blogspot.com/2018/11/lambda-penultimate_16.html.

Wildenhain, Tom. 2017. “On the Turing Completeness of MS PowerPoint.” http://www.andrew.cmu.edu/user/twildenh/PowerPointTM/Paper.pdf.

1. If you want to look up these combinators elsewhere, this is the only one you won’t be able to find: it’s much less common than K, and where I have found it people just call it K, so I had to pick a different letter to distinguish it↩︎