Bear with me if I seem to be veering out of my lane (as they say nowadays), but let me ask: What is chess? If you play with a chess set in which a lost pawn has been replaced with a button, you’re violating tournament regulations but most people would say you’re still playing chess; the button, viewed from “inside” the game, is a pawn. Likewise, if you’re playing against your computer, the picture of a chessboard that you see on your screen is fake but the game itself is real. That’s because chess isn’t about what the pieces are made of, it’s about the rules that we follow while moving those pieces. Asking “Do pawns exist?”, meaning “Are there real-world objects that behave in accordance with the rules of chess?”, misses the point. If one of your pieces has been shoddily manufactured and spontaneously fractures, that doesn’t mean that your mental model of how chess pieces behave is flawed; it’s reality’s fault for failing to conform to your mental model.

You’ve probably already guessed the agenda behind my rambling about chess, but here it is explicitly: I claim that math (pure math, anyway) is as much a game as a science. The objects of mathematical thought, like the pieces in chess, are defined not by what they “are” but by the rules of play that govern them. The fact that in math the pieces exist only in our imaginations and the moves are mental events doesn’t make the rules any less binding. And even though the rules are human creations, once we’ve agreed to them, the answer to a question like “Is chess a win for the first player?” or “Is the Riemann Hypothesis true?” aren’t matters of individual opinion or group consensus; the answers to our questions are out of our hands, irrespective of whether we like those answers or even know what they are.^{1}

**PEOPLE AT A PARTY**

I’ll illustrate my point with a gem of discrete mathematics that can be traced back to a polymathic prodigy named Frank Ramsey who made deeply original contributions to math, philosophy, and economics before dying in 1930 at the age of 26. (For more about Ramsey see the Anthony Gottlieb article and the Cheryl Misak book listed in the References.) I learned most of what I know about the body of mathematical work known today as Ramsey Theory from a great book I read back in the 1980s by Ronald Graham (who died earlier this month; more about him later) and his coauthors Bruce Rothschild and Joel Spencer. Two good article-length introductions to the topic (both originally published in Scientific American) are Martin Gardner’s 1977 article and Graham and Spencer’s 1990 article, listed in the References.

The mathematical gem is a puzzle often phrased in terms of a party attended by six people. Must it be true that you can find three of the six who are mutual acquaintances or three of the six who are mutual nonacquaintances?^{2} That formulation of the puzzle can lead to confusion, hinging on questions like “If A is acquainted with B, must B be acquainted with A?” (answer: for purposes of the puzzle, yes); “Isn’t it possible for there to be degrees of acquaintanceship, with no clear dividing line between acquaintances and nonacquaintances?” (answer: for purposes of this puzzle, it doesn’t matter where you draw the line, as long as you draw it somewhere); “Isn’t being acquainted time-dependent?” (answer: sure, but the claim is that at any given moment you can find three mutual acquaintances or three mutual nonacquaintances).

If you find such real-world issues distracting, you might prefer a pictorial model. If we have six points (call them *vertices*^{3}) arranged in a regular hexagon, and we connect each pair of vertices by either a red line segment or a blue line segment (call these segments *edges*), must it be true that you can find three vertices that are joined up by red edges or three vertices that are joined up by blue edges?

Since it gets tiresome to keep saying “three vertices that are joined up by red edges” let’s call such a trio of vertices a “red triangle”, and likewise let’s call a trio of vertices that are joined by blue edges a “blue triangle”. We have to keep in mind that points where edges cross are not vertices, so the picture below does not actually contain what we mean by a red triangle. Likewise for blue triangles. In making these stipulations we’re not trying to legislate reality; we’re just saying how terms are defined in Ramsey theory.

Here’s a picture that I generated by putting in colors at random. There are 4 red triangles and 2 blue triangles. How quickly can you find them all (if that’s your kind of thing)?

Part of what makes the six-people puzzle tricky is that it’s just on the line between doable and undoable; if instead of six vertices you have a smaller number (or equivalently if there are fewer than six people at the party), finding three mutual acquaintances, or failing that three mutual nonacquaintances, might under some circumstances be an impossible task. For instance, if there’s a party consisting of two men who are acquaintances and two women who are acquaintances but neither of the men is acquainted with either of the women, then there won’t be three mutual acquaintances nor will there be three mutual nonacquaintances. You might enjoy trying to come up with a situation where there five people at a party but there are no three mutual acquaintances and no three mutual nonacquaintances. For an answer expressed pictorially, see Endnote #4.

You might enjoy pondering the six-person party puzzle on your own for a bit, but if you’d like a hint to get you started, try this: Suppose that you are one of the people at the party. You look around at the other five people. If the majority of them are people you’re acquainted with, there must be at least three such people; mentally single them out in some way (imagine them wearing silly hats if you like). Alternatively, if the majority of the five are people you’re unacquainted with, there must be at least three such people; mentally single out three people you’re *not* acquainted with. Either way, consider the situation that prevails among the three people you’ve singled out and how they relate to one another and you in terms of who is acquainted with whom. If you’re still stumped, check Endnote #5.

Although I used parties and colored drawings to anchor the puzzle in physical reality in a hopefully helpful way, the question isn’t a question about the real world. If you attend a party and you find two people who you deem to be neither acquainted nor unacquainted but rather “half-acquainted”, that doesn’t break the logic of the solution to the puzzle; it just means that the assumptions of the puzzle don’t apply to your party. Likewise, if you want to challenge the red/blue dichotomy underlying the puzzle by arguing that there are intermediate colors that some people call red and others call blue (an issue that actually arises in the real world for blue and green, which many people have trouble distinguishing) — then you’re doing psychology or optics but you’re not doing Ramsey theory, in much the same way that Alexander the Great in cutting the Gordian knot wasn’t doing knot theory.

**SIM**

One thing that complicates my tidy analogy between doing math and playing games is that sometimes people invent math to analyze games or invent games to exemplify math. An instance of this that I’ve already written about is the interplay between combinatorial games and Conway’s system of surreal numbers. I’ve also invented a game of my own called Swine in a Line for the purpose of illustrating a mathematical process called chip-firing (which itself is often called a game even though it isn’t one).

A game related to Ramsey Theory is the pencil game Sim, invented by mathematician Gustavus Simmons, played on a sheet of paper initially marked with six vertices arranged in a regular hexagon and no edges connecting them. You’ll need two colored pencils for this game, say red and blue. You and your adversary take turns connecting the vertices with red edges and blue edges (one of you drawing red edges, the other drawing blue) until either a red triangle or a blue triangle is created, at which instant the person who created the triangle loses. The Ramsey theorem I mentioned above — telling us that when every edge has been added in red or blue, there must be a red triangle or a blue triangle (or both) — tells us that when all fifteen possible edges have been drawn if not sooner, there must be at least one such triangle, so the game can’t end in a draw. It’s now known, thanks to a brute-force computer search that inventoried all positions and all moves, that the second player has a winning strategy, but nobody has been able to come up with a simple way to implement that strategy, short of consulting the computer’s annotated list of all positions and all moves.

Ben Orlin (of Math With Bad Drawings fame) is writing a not-yet-named book about mathematical games, and Sim is one of the games he writes about. He actually prefers a slightly different version he calls “Jim Sim”, after his father Jim Orlin who came up with it; in Jim Sim, the six vertices are drawn at the corners of two nested triangles.

Ben prefers this arrangement because it cuts down on the number of places where edges cross other edges and create visual clutter. Personally, I prefer the extra symmetry of the standard version. But mathematically, the difference between ordinary Sim and Jim Sim is purely cosmetic. (Ben proposes a special rule to handle situations where a player has won without knowing it, allowing the losing player to win; it’s cute but mean.)

**PLAYING BY THE RULES**

In any game, it’s important to specify the rules as precisely as possible, but it’s hard to anticipate every conceivable misunderstanding. Some misunderstandings are innocent; others are mischievous. I still don’t know what to think about a misunderstanding in my own life that took place almost fifty years ago. According to a well-known result that long predates (and in a way prefigures) Ramsey theory and the game of Sim, tic-tac-toe always ends in a draw if both players play intelligently. At the age of twelve, I became aware of this fact and boasted to a six-year-old that I couldn’t lose at tic-tac-toe. She promptly “beat” me, playing X to my O, like this:

I wasn’t able to convince her that the game isn’t played this way; she thought I was just being a sore loser. Or was she trolling me?

A lot of math-trolling on the web about standard mathematical facts is based on a misunderstanding of the rules of the game. Some misunderstandings are innocent, others seem willful, and others are hard to classify. For instance, I once saw someone critique a valid algebraic proof saying “But look, you added 0 to the left-hand-side of the equation but not the right hand side! In algebra, you always have to do the same thing to *both* sides!”. I suspect that that person’s real quarrel is not with the mathematical establishment but with a teacher who didn’t describe the rules of algebra with sufficient clarity a decade or two ago.

**THE IMPOSSIBLE PROBLEM**

Whether you know it or not, we’ve been talking about graph theory. That’s the mathematical discipline that allows one to get rid of all the annoying real-world ambiguities inherent in the people-at-a-party scenario. If a mathematician at a party with six or more people, in an effort to be accessible, tells a non-mathematician the people-at-a-party-puzzle, and the non-mathematician willfully and repeatedly tries to dissolve the puzzle by making real-world-based objections, what’s a peeved mathematician to do but to create a domain of the mind in which the mathematical solution is impervious to quibbles because the assumptions of the puzzle hold by definition?

In analytic geometry a graph is a picture representing a function like *y* = *mx* + *b* or *y* = *x*^{2}, but that’s a different context.^{6} In graph theory, a graph is a collection of points and a collection of segments joining those points, called vertices and edges respectively. One special sort of graph with *n* vertices contains all the possible edges joining them; this is called a *complete* graph on *n* vertices, sometimes abbreviated *K** _{n}*. The way a graph theorist would ask the people-at-a-party puzzle is, given any red-blue coloring of the edges of a

What about other values of *a* and *b*? It’s long been known that *R*(4,4) is 18. That is, if you have a *K*_{18} and you color each of its edges red or blue, it must contain a red *K*_{4} or a blue *K*_{4}. Or (going back to the other model) if there is a party attended by 18 people, you can always find four people who are mutual acquaintances or four who are mutual nonacquaintances (or both). And 18 is the smallest number with this property; if you replace 18 by 17 the claim becomes false, as can be shown by analysis of the highly symmetrical figure below (where two circles are connected by a path if and only if the two associated people are acquaintances).

In an astonishingly original piece of work, Ramsey proved that this cutoff phenomenon is universal. Pick any positive integer *k* you like; once the number of people at the party becomes large enough, you can be sure that there will be *k* guests who are mutual acquaintances or *k* guests who are mutual nonacquaintances (or both). The hard part is figuring out what “large enough” means. When *k* is 3, large enough means 6 people; when *k* is 4, large enough means 18 people; and when *k* is 5, nobody knows exactly where the cutoff is, even though Ramsey proved that the cutoff exists! Back when Graham et al. wrote their book on Ramsey theory, all that was known about the Ramsey number *R*(5,5) is that it’s somewhere between 43 and 49 inclusive. In 2017, Vigleik Angeltveit and Brendan McKay were able to shave that 49 down to 48. Similarly, all we currently know about the Ramsey number *R*(6,6) is that it’s somewhere between 102 and 165 inclusive.

Part of what’s poignant about such Ramsey problems is that each one, taken individually, is a question that could in principle be solved by brute force on a computer capable of systematically examining the brutally large but still finite set of possibilities.^{7} Evelyn Lamb takes a look at the issue through the lens of Moore’s Law, and the results are discouraging. Even with Moore’s Law on our side, computers will never be powerful enough to pin down the Ramsey number *R*(6,6) unless we come up with a better way to think about it.

The mathematician Paul Erdős, an early pioneer of Ramsey theory, didn’t think the human race was smart enough to find a better way. He once asked an audience to imagine a scenario in which super-powerful aliens threaten humankind, saying they’ll destroy our species unless we compute some specified Ramsey number. In the case *R*(5,5), Erdős imagined that, if we threw all our computer power and brain power into the effort, with the threat of annihilation providing extra motivation, humanity could determine whether the Ramsey number is 43, 44, 45, 46, 47, or 48. In contrast, in the case of *R*(6,6), Erdős opined that our best chance of survival lay not in meeting the aliens’ challenge but in finding ways to defeat them militarily, because as tough as those aliens are, they can’t be as tough as Ramsey problems. In his book “Inevitable Randomness in Discrete Mathematics”, Jószef Beck writes: “Ramsey Games represent a humiliating challenge for Combinatorics.”

Though Erdős’ parable is farfetched, I can think of something even more implausible: aliens who show up threatening to destroy us unless we prove or disprove the proposition that chess is a win for the first player. Let me be clearer about the scenario I’m discussing and disbelieving. I can swallow aliens who’ve studied our culture, including our quaint game of “chess”, and who taunt us by challenging us to prove something about it. What I *can’t* swallow is aliens who have come up with the game of chess *independently of us*. That’s because there are just too many possible chess-like games and not enough life-bearing planets in the galaxy.^{8} In contrast, I believe Ramsey theory gets invented on most planets that develop space-faring civilizations. Indeed, one of the animating impulses behind Ramsey theory can be experienced when one looks up at the night sky, observes geometric patterns in the stars, and wonders “How many of these seemingly meaningful patterns actually become inevitable once you have enough stars in your sky?”

**RON GRAHAM**

I had the pleasure of knowing Ron Graham. He may be best known to the public at large for having invented “Graham’s number”^{9} in the course of his work on Ramsey theory, but I was very influenced by other work of his (much of it joint with his wife, the mathematician Fan Chung) on quasirandom graphs. He was generous to me in various ways, of which I’ll mention three:

1) In the late 1980s Ron gave me an article to referee for a journal, saying “I think you’ll like it.” Spending a few weeks immersed in that article gave me ideas that were at the the center of my research for more than a decade.

2) Around the same time Ron gave me a puzzle from his office called a Panex. It’s similar to the Tower of Hanoi puzzle, but harder. In the case of the Tower of Hanoi puzzle, which involves moving disks between three spindles, it’s known exactly how many moves are required to solve the problem when there are *n* disks. In contrast, the optimal solution to the general version of the Panex puzzle is still unknown. The Panex that Ron gave me is still in my office. If you want to give the puzzle a try, here’s a virtual version.

3) Back in the early 1990s Ron invited me to attend a conference I’d never heard of. It was a “Gathering 4 Gardner“, the second such gathering in an ongoing series, and these biennial meetings have been a big part of my mathematical life ever since.

Ron also gave me a suggestion thirty years ago that I hope to bring to fruition during the coming year. Visiting MIT in the mid-90s, Ron passed an elevator bay along one of the corridors near the math department and quipped that it ought to sport a sign that said “*L*(*η*)” (if you don’t get the joke, try saying it the way a mathematician would if *L *were a function and *η* were the argument to which the function was being applied). At the time I was just an assistant professor at MIT with no voice in departmental decor. But now that I am a full professor at UMass Lowell, I plan to put up an *L*(*η*) sign next to the elevator bank at our department’s new headquarters.

Ron was responsible for creating a lot of good math, and he also took responsibility for communicating it to the broadest possible audience (where the meaning of “broadest possible” of course depended on how technical the result was). Here’s one morsel I just learned about recently: Consider the operation of taking a string of decimal digits and inserting plus signs between some of them (for instance, 123 could become 12+3 or 1+23 or 1+2+3) and calculate the sum. If you keep performing this “insert-and-add” operation over and over, you eventually get stuck and the game ends; for instance 12+3 is 15 and 1+5 is 6 and the game is over, or 1+23 is 24 and 2+4 is 6 and the game is over, or 1+2+3 is just 6 right away. It’s easy to show that you always get down to a single-digit number, and it’s not much harder to show that you always get down to the same single-digit number no matter how you stick in the plus-signs. This phenomenon (based on the same logic that Martin Gardner describes in his article on digital roots) could be used as the basis of a magic trick that Ron would have liked (see his book with Diaconis in the References). Ask someone from the audience to arrange the digits 1 through 9 in any order, perform the insert-and-add (with them, not you, picking where to insert the plus signs) over and over until a single-digit answer appears, and then have them open an envelope you gave them at the start of the trick, containing a piece of paper on which you’ve written “The final answer is 9.” That’s a nice trick, but it’s not the morsel I’m talking about. Here’s the morsel: No matter what string of digits I give Ron, there’s a way for him to insert plus signs in such a way that after just three applications of insert-and-add, he’ll arrive at a single-digit answer.

My first thought when I read this claim was that I must have misunderstood it. If I start with a typical large number *n*, the sum of its digits is likely to be close to the logarithm of *n*, give or take a factor of 10.^{10} So Graham and his coauthors seem to be saying that if you start with any number *n*, however large, and take the logarithm three times in succession, you get to an answer that’s ten or smaller. You don’t need Graham’s number to see that that’s ridiculous; even as comparatively puny a quantity as

will do.

But do you see the mistake I made? Nobody said Ron has to insert plus signs everywhere. It’s true that if I want the very next number in the number-chain to be as small as possible, inserting plus signs everywhere is the way to go, but that strategy may show its short-sightedness further down the line; the race doesn’t always go to the strongest starter. For instance, consider the starting number 919; if Ron turns it into 91+9 instead of 9+1+9, he can get to the final answer 1 in two steps (91+9 = 100; 1+0+0 = 1) instead of three steps (9+1+9 = 19; 1+9 = 10; 1+0 = 1).

“Fair enough,” you might say, “But Ron got lucky that time; if he inserts plus signs into a string of digits at random, it won’t happen very often that the insert-and-add operation will lead to a number with so many zeroes.”

But wait: Ron doesn’t need this proliferation of zeroes to be something that happens at *random*. As long as there’s *some* way of inserting the plus signs to make lots of zeroes appear, that’s good enough for him. And there are exponentially many ways to insert those plus signs, so he has lots of ways to potentially win.

So maybe it’s not so surprising that if Ron plays the insert-and-add game wisely, he can arrive at a single-digit answer in fewer steps than if he naively puts in plus-signs everywhere. But just *three* steps? No matter how big the original number is? That’s kind of amazing. See the Butler, Graham, and Stong article for details. (It’s a bit more complicated than making sure you get lots of zeroes right away.)

To get a sense of Ron Graham as a person, I urge you to check out George Csicsery’s short video portrait of Graham, entitled “Something New Every Day”.

In addition to being a mathematician, Graham was also an avid juggler, and once said “The problem with juggling is, the balls go where you throw them, not where you wish they would go, or where they are supposed to go.” Once a ball leaves your grip, its trajectory is out of your hands. Graham might have added that, analogously, the problem with math is that it follows the rules you set up. Once you’ve launched a definition into the air, it goes where you sent it, and the theorem that lands in your hands, possibly much later, may be very different from what you expected. At times this discrepancy can be a source of frustration, but for the mathematician it can also be a source of delight.

*Thanks to Jeremy Cote, Sandi Gubin, David Jacobi, Andy Latto, Joe Malkevitch, Gareth McCaughan, Ben Orlin, Evan Romer, and Rich Schroeppel.*

**ENDNOTES:**

#1: Here I’m stressing the “internal” view of math. There is an equally important external view that situates math in historical and cultural context, dealing with such topics as “What questions get asked in the first place? Why these particular definitions and rules, and not others?” Nearly all mathematics has some real-world phenomenon as its point of departure, and much mathematics that has left the real world far behind can orbit back and have real-world applications. But that’s not what I’m talking about here.

#2: Here and elsewhere, “or” is meant in the inclusive, “and/or” sense; if you can find three mutual acquaintances *and* three mutual nonacquaintances among the six people, all the better!

#3: Martin Gardner used the plural noun “vertexes”, but I think this is less common nowadays. Have any of you seen this in recent publications? How about “matrixes” instead of “matrices”?

#4: Here’s a graph with 5 vertices whose whose edges have been colored red and blue, containing no red triangle and no blue triangle.

#5: Say the three people you’ve singled out are all people you’re acquainted with. If any two of them are acquaintances, then the two of them, along with you, form an acquainted threesome. On the other hand, if no two of them are acquaintances, then the three of them form an unacquainted threesome. Similar logic applies in the case where the three people you’ve singled out are all people you’re unacquainted with.

#6: The word “graph” is used in rather different senses in analytic geometry and discrete mathematics. Do mathematicians writing in languages other than English have to contend with this ambiguity? Let me know in the Comments.

We allow the edges to cross one another, but we don’t allow an edge to pass through extra vertices en route from one vertex to another. To make the pictures nicer, we sometimes allow the edges to curve, but we don’t allow a curved edge to join a vertex to itself, nor do we allow more than one edge between two particular vertices.

If you push mathematicians hard they will retreat even further from reality and say that a graph isn’t actually a picture at all, but rather an abstract collection of entities called vertices that are unspecified objects of pure thought, along with other entities called edges that are unordered pairs of vertices. So don’t push us.

#7: Remember the “find the 4 red triangles and 2 blue triangles” challenge from earlier? Solving it is just a matter of patience, not cleverness, since there are only twenty trios to try. If you disliked that puzzle, I don’t blame you; I’m more interested in problems that require intelligence. Most famous mathematical problems aren’t reducible to exhaustive examination of finitely many cases; for instance, both Fermat’s Last Theorem (solved) and the Riemann Hypothesis (still open) are “infinitary” in this sense.

#8: If you doubt this, take a look at all the variants listed in the Wikipedia page on chess variants and now imagine all the variants that *could* be listed there but aren’t. If nevertheless you think that chess stands out as one of the best of the bunch, for reasons that would apply throughout the galaxy, please explain!

If you’re wondering why I’m not worried about invasions from other galaxies, it’s because even though interstellar distances are large, intergalactic distances are *ridiculously* large. Any intergalactic being that tried to pop over to slake its thirst for conquest would become millions of years old and eventually forget why it had wanted to come to our galaxy in the first place. Multigenerational voyages wouldn’t solve the problem either: after the initial crew died, later generations would rebel against being cooped up in a spaceship on a journey they’d never get to finish, especially when the whole idea of the trip wasn’t theirs to begin with.

#9: Graham’s number is big. Really big. You just won’t believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it’d be hard to write out all the digits of nine raised to the ninth power nine times, but that’s just peanuts to Graham’s number. If you’re itching to know more, filmmaker Brady Haran has what you crave. Haran is the creator of the trilogy of mathematico-cinematic blockbusters “Graham’s Number”, “How Big Is Graham’s Number?”, and “Wait, What Is Graham’s Number Again?”, which he followed with a fourth video, entitled “Well, That About Wraps It Up For Graham’s Number”. (The actual names of some videos have been changed to be more like something Douglas Adams would have come up with.)

As far as I know, nobody has proposed a game playable by pan-dimensional beings in which the impossibility of the game ending in a draw is proved by the theorem of Graham and Rothschild that led Graham to invent his number. But if anyone did, the game would clearly deserve to be called “GRim” (in honor of Graham and Rothschild).

#10: Suppose *n* has *k* digits, and suppose that *n* contains each of the digits 0 through 9 in roughly equal proportion, so that the average of the digits is around 4.5; then the sum of the digits should be about 4.5 times *k*, which is about 4.5 times the base ten logarithm of *n*.

**REFERENCES**

Jószef Beck, Inevitable Randomness in Discrete Mathematics.

Alexander Bogomolny, “Ramsey’s Number R(4,4)” at cut-the-knot.org.

Anthony Bonato, “Breakthrough in Ramsey Theory”, The Intrepid Mathematician, April 5, 2017, https://anthonybonato.com/2017/04/05/breakthrough-in-ramsey-theory/ .

Steve Butler, Ron Graham, and Richard Stong, Inserting Plus Signs and Adding, The American Mathematical Monthly, 123(3), March 2016, 274-279, http://orion.math.iastate.edu/butler/papers/16_03_insert_and_add.pdf .

Fan Chung, About Ron Graham, http://www.math.ucsd.edu/~fan/ron/ .

F. R. K. Chung, R. L. Graham, and R. M. Wilson, “Quasi-random graphs”, Combinatorica, volume 9, pages 345–362 (1989), http://www.math.ucsd.edu/~fan/wp/quasirandom1.pdf .

George Csicsery, “Something New Every Day: The Math & Magic of Ron Graham”, Zala Films; https://vimeo.com/136044050 .

Martin Gardner, “Digital Roots”, in “The Second Scientific American Book of Mathematical Puzzles and Diversions”.

Martin Gardner, “Ramsey Theory”, chapter 17 of “Penrose Tiles to Trapdoor Ciphers… and the Return of Dr. Matrix”.

Martin Gardner, “Sim, Chomp, and Racetrack”, chapter 9 of “Knotted Doughnuts and Other Mathematical Entertainments”.

Anthony Gottlieb, “The Man Who Thought Too Fast”, The New Yorker, April 27, 2020, https://www.newyorker.com/magazine/2020/05/04/the-man-who-thought-too-fast .

Ronald Graham, Papers of Ron Graham, http://www.math.ucsd.edu/~ronspubs/ .

Ronald Graham, Bruce Rothschild, and Joel Spencer, “Ramsey Theory”, Wiley Press; available through https://www.wiley.com/en-us/Ramsey+Theory%2C+2nd+Edition-p-9780471500469 .

Ronald L. Graham and Joel H. Spencer, “Ramsey Theory”, Scientific American, Vol. 263, No. 1 (July 1990), pp. 112-117. http://www.math.ucsd.edu/~fan/ron/papers/90_06_ramsey_theory.pdf

Ronald Graham and Persi Diaconis, Magical Mathematics: The Mathematical Ideas That Animate Great Magic Tricks, Princeton University Press; available through https://press.princeton.edu/books/hardcover/9780691151649/magical-mathematics .

Evelyn Lamb, “Overthinking an Erdos Quote”, https://blogs.scientificamerican.com/roots-of-unity/moores-law-and-ramsey-numbers/

Cheryl Misak, “Frank Ramsey: A Sheer Excess of Powers”, Oxford University Press; available through https://global.oup.com/academic/product/frank-ramsey-9780198755357? .

]]>The more you know, the more you forget.

The more you forget, the less you know.

So why study?

The less you study, the less you know.

The less you know, the less you forget.

The less you forget, the more you know.

So why study?

*— “Sophomoric Philosophy”*

Poor Oedipus! The mythical Theban started out life with every advantage a royal lineage could offer but ended up as the poster child for IFS: Inexorable Fate Syndrome. His parents packed him off in infancy to evade a prophecy that he’d kill his father and marry his mother. He was found on a mountain and raised by a shepherd, so Oedipus didn’t know who his birth parents were. Once he learned about the prophecy he did everything he could to avoid fulfilling it (aside from not killing or marrying anyone, which in those times would have been an undue hardship), but he still ended up doing exactly what he was trying not to do.

If the story of Oedipus seems a bit removed from real life, listen to episode 3 of Tim Harford’s podcast “Cautionary Tales”, titled “LaLa Land: Galileo’s Warning”, to hear about systems that were designed by intelligent, well-meaning people to avert disasters but which ended up causing disasters instead.

In Harford’s diagnosis, the problem is that in adding safeguards to a system we increase its complexity, which makes it harder for our feeble human minds to imagine all the ways in which the system might fail. Yet sometimes it’s not complexity that bites back at us but some simple variable that we’ve failed to take into account. For example, say you live on an island with too many wolves. The obvious solution is to encourage wolf-hunting. Unfortunately, suppressing the wolf population means there will be less predation of the deer population (Did I mention there are deer on this island?), which means the deer population will surge next year, which means there’ll be more young deer for mama wolves to feed to their cubs, and come the year after, there’ll be more wolves than ever.

And that’s if you’re lucky. If you’re unlucky, you succeed in killing all the wolves, which leads to an explosion of the deer population, which leads to irreversible overgrazing (Did I mention the grass?), and then your once-complex ecosystem becomes a wolf-free, deer-free, grass-free desert island.

If this had been a real-world example instead of a parable, there would have been lots more species, and the island probably wouldn’t be an island but a region with porous borders that allow wolves, deer, and other animals to cross into the region and out of it. Maybe the overlooked variable wouldn’t have been the deer population but something subtler — something that in retrospect is hard to miss, but in advance is not so easy to pick out of a crowd of jostling variables. Here’s an example mentioned to me by pre-reader Shecky Riemann (can anyone provide a source for his story?). I quote Shecky’s version:

“In some New England town wildlife officials realized the Wood Duck population was inexplicably falling, while they noticed the raccoon population was growing; they believed raccoons were raiding Wood Duck nests for the eggs, causing the subsequent decline. So they hunted, or trapped and relocated, much of the raccoon population, only to find in time that the Wood Duck population declined *even more* precipitously… At that point they discovered that Wood Duck babies were indeed hatching out but upon leaving the nest and plopping into the water (as they do while very young, well before they can fly) they were immediately being taken by snapping turtles, whose population had ballooned because the only thing previously keeping them in check were the raccoons who ate snapping turtle eggs.”

The predator-prey systems I’ve described embody the idea of a negative feedback loop. This is a causal loop between two or more variables (let’s stick to just two and call them X and Y to keep things simple) where making X bigger makes Y bigger (and making X smaller makes Y smaller) but making Y bigger makes X *smaller* (and likewise making Y smaller makes X bigger).^{1} For instance, say X is the severity of a pandemic and Y is the amount of care people take to prevent disease transmission. When the disease is ripping through a population, people get scared and take care, which after a while causes the number of new infections to drop. But then people get lax, causing the rate of new infections to rise again. To the extent that such a simple picture accurately captures key features of the SARS/CoV2 pandemic in 2020, the right question (despite the magazine caption) is not “Will infections rise as states reopen their economies?”, but “How much?”

Why do I call this essay “The mathematics of irony”? I won’t be so foolish as to say that all, or even most, irony comes from negative feedback loops. But a lot of irony comes from reversal of expectations, and the presence of negative feedback loops involving variables you’ve ignored can be one reason for reasonable interventions to backfire.

This isn’t news to anyone who studies complex systems. The problem is that scientists and science educators (and I include myself among their number) haven’t done a good enough job of explaining the complexity of reality, and overcoming people’s desire for simple answers by exploiting people’s love of a good story. Too many members of the voting public see a sentence like “Population dynamics can be counterintuitive” as a cowardly equivocation or an outright lie, and are all too ready to throw in their lot with someone who runs on a simple, thrilling campaign slogan like “I will kill the wolves.”^{2}

Many physical systems can be understood through the lens of feedback, as seen in something as simple as a pendulum. If you pull the bob to the right, the rightward deflection gives rise to leftward acceleration, which gives rise to leftward velocity, which gives rise to leftward deflection, which gives rise to rightward acceleration, which gives rise to rightward velocity, which gives rise to rightward deflection, and so on.

Here’s a simplified overview of what happens. The four sketches in the middle show, in each of four illustrative cases, where the pendulum bob is (the dot) and where it’s going (the small arrow), and the descriptions on the outskirts give verbal descriptions of each case; the big arrows along the outside show how the system evolves over time.

The mathematics governing this system, expressed in the form of a pair of mutually referential differential equations, has many similarities to the mathematics of an oscillating spring, or an inductor in an electrical circuit. In fact, if you ignore nonlinearities^{3} in the equations, you’ll find that the equations for a pendulum become formally identical to the equations for an electrical circuit; the quantities in one system (deflection, velocity) are different from the quantities in the other (current, voltage), but the way a given quantity in one system evolves over time is identical to the way the corresponding quantity in the other system evolves. The underlying schema is “the” simple harmonic oscillator, one of the stars of an undergraduate physics education.

I hasten to say that no scientist would call a pendulum a negative feedback system as I have, but it isn’t for mathematical reasons; it’s because scientists normally use the term “feedback” to describe a relationship between one part of a system and another (wolf-population and deer-population, say), whereas the deflection and velocity of a pendulum bob aren’t different “parts” of a system — they’re different ways of describing *one* part. But the mathematics of feedback loops doesn’t know about parts and wholes; it only knows quantities and how they evolve in time under the sway of differential equations. And from that point of view, a simple harmonic oscillator could be considered the prototype of a system with oscillatory behavior due to negative feedback. Systems with negative feedback loops give rise to oscillating behavior, with oscillations that can decay over time, grow without bound^{4}, settle down into a stable cycle, or approach a single stable state.

Fiction is rife with cycles arising from negative feedback loops. A classic kind is the time-travel paradox: if you travel back in time and kill your grandfather before he has kids, then your father never exists, so *you* can’t exist, but then you never travel back in time, so your father does end up existing, and so do you, with the result that you do end up traveling back in time after all, etc. (Maybe calling this a feedback loop is a bit of a stretch, since it involves discrete variables — you’re alive or you’re not, your grandfather is alive or he isn’t — rather than continuous variables of the kind we’ve discussed so far.) You can also see the negative feedback loop governing a fictional couple for whom “A approaches B” leads to “B avoids A” leading to “A avoids B” leading to “B approaches A” leading back to “A approaches B”. If you have favorite examples of negative feedback loops in life or in literature, please let me know in the Comments.

My favorite example of a negative feedback loop in my own life comes from a time — and I hasten to say that this happened *many, many years ago* in case anyone who works for my auto insurance company is reading this — when I was about to leave my home to teach a calculus class and foolishly tried to save a bit of time by adjusting the seat while starting to drive. To slide the driver’s seat forward, one had to reach underneath the seat and pull on a release lever that would allow the seat to slide freely on its tracks. That’s what I did, and if I’d been smarter I would have moved the seat to its new position and let go of the release lever *before starting to drive*. But instead I started the car and put my foot on the gas pedal while my seat was still free to move forward and backward. Can you guess what happened before reading on?

The car started moving forward, but remember, objects at rest tend to remain at rest, so my seat (being free to slide) stayed put with respect to the street, which is to say that my seat moved backward with respect to the car. This caused my foot to leave the gas pedal. That caused the car to slow down, which caused my seat to move forward with respect to the car. That movement pressed my foot against the pedal again, which caused the car to lurch forward again, and so on. The negative feedback loop was unstable, so each successive motion of the car was more violent than the one before, until finally the car (which had manual transmission) stalled out.

In the twentieth century, back before the prefix “cyber-” got repurposed to mean “computer-y”, there was a worldview called cybernetics that saw feedback loops everywhere.^{5} Cybernetics bloomed in conjunction with an engineering discipline called control theory. A key concept of both fields is homeostasis, an equilibrium state achieved through use of a feedback loop. An example of an engineered feedback loop is the thermostat, which allows cooling to happen when something becomes too warm, and warming to happen when it becomes too cool. In the natural world, an example of equilibrium is seen in the predator-prey model I mentioned before. And you can thank feedback mechanisms built into your warm-blooded human body for keeping your core temperature in a zone that permits you to be alive right now. (Not to mention dozens of other metabolic variables that your body is always silently, cybernetically adjusting.)

The cybernetic worldview rightly stressed the importance of feedback loops (back in college I was especially enthralled with Gregory Bateson’s “Steps to an Ecology of Mind”), but many proponents of cybernetics thought that the study of feedback loops would explain everything. It didn’t, and now the word “cybernetics” itself has become an oddity, known to many only through the name of the fictional Sirius Cybernetics Corporation invented by Douglas Adams as part of his “Hitchiker’s Guide” universe. Cybernetics had a great first act, but it ran out of new ideas with predictive power. And in a way that’s a shame, because the key insight of cybernetics is one that, regardless of whether it leads to new scientific advances, could lead humankind to a more sophisticated understanding of cause and effect.

Cybernetics was followed by catastrophe theory, chaos theory, complex systems theory, and so on. Each new wave yielded new explanatory power, and beyond that, new metaphors. With the new metaphors came hype. And within the scientific establishment there was some backlash to the hype (yes, feedback loops pop up everywhere once you start looking for them). I feel torn: hype has no place in scientific research, but in popular writing about science, tasteful enthusiasm for deep ideas has the power to awaken wonder and inspire an appreciation of just how beautifully complex the world is.

But I never told you what happened to Oedipus after he fulfilled the prophecy. When he was king of Thebes there was a great plague, and in trying to figure out what had caused it, he came to realize that he himself was the cause. And he acknowledged his culpability publicly, with an act of self-mutilation that symbolized his former blindness to the machinations of fate.

So, pity Oedipus. And while you’re at it, get ready to pity yourself and me and everybody else in the year 2020, because in the current year and the years after, many intelligent and well-meaning people will be doing their best to steer two complex systems (the biosphere and the world economy) that we don’t understand. We’re flying blind, and there’s a good chance that some of our interventions will backfire and have ironic consequences for reasons that will only be obvious in retrospect.^{6}

POSTSCRIPT:

Having written the preceding paragraph in early June and having re-read it in mid-June, I think our biggest problem isn’t that we humans will take action based on simplistic models of how the world works and that those actions will have perverse consequences. I think a bigger problem is that we’ll know the right thing to do and we’ll nevertheless fail to take the simple steps that actually work, just because we get tired of doing the right thing, day after day. It’s so hard. It was hard from the start, but at least at the start it was something different to do. Now it’s hard and *boring*.

And now that I’ve re-read *those* words, I see an even gloomier possibility. It seems clear that, at a societal level, abandoning social distancing now is self-destructive. But what if all the people going around in public without face masks this week are, in a certain sense, making completely rational individual decisions? After all, the main purpose of wearing a standard-issue mask is to protect others, not yourself. If you’re wearing a mask when no one else is, you’re putting up with discomfort and inconvenience for the sake of other people and their acquaintances — people you don’t even know^{7}, and if your action turns out to save someone’s life you’ll never find out about it. You’d be *so* much more comfortable if you weren’t wearing the mask; so taking off that mask would be the *rational* thing to do, wouldn’t it?

This pandemic won’t kill off humankind, and neither will the next. But if a species that evolved intelligence for its survival-value ends up going extinct because of the selfish rationality of its individual members, that might be the biggest irony of all.

*Thanks to Sandi Gubin, Dave Jacobi, Andy Latto, Fred Lunnon, Gareth McCaughan, Shecky Rieman, Evan Romer, and Steve Strogatz.*

Next month: Math, games, and Ronald Graham.

**ENDNOTES**

#1: This should not be confused with a situation in which making X bigger makes Y smaller and likewise making Y bigger makes X smaller. That might naively seem “even more negative”. But that’s an example of *positive* feedback, in terms of how a change in X will tend to reinforce itself rather than reverse itself over time.

#2: The cowardly lie “Nobody knew wildlife-management was so complicated” can be saved until after the candidate is safely elected.

#3: Here I’m assuming that the deflection angle *θ* is so close to 0 that we can replace sin *θ* in the differential equation by *θ*, resulting in a linear differential equation that gives a good approximation to the pendulum’s actual behavior. Nonlinear differential equations are extremely important, but nonlinearity isn’t one of the themes of this essay.

#4: When oscillations grow without bound in a linear model of a phenomenon, the upshot in the real world is that the oscillations grow until the system leaves the regime within which linearity is a good approximation to reality.

#5: The word “cybernetics” is derived from the Greek word κυβερνήτης from the Greek root meaning to steer, navigate, or govern, and indeed “governor” was the name James Watt gave to the feedback mechanism of his steam engine.

#6: For instance: having everybody stay home during a pandemic seems like a good way to prevent viral transmission, and it probably is, on a societal level. But if the severity of an infection depends on initial viral load, this strategy has the unintended effect that uninfected people who shelter with infected people stand to get worse cases of the illness. I learned about this from Siddhartha Mukherjee’s article listed in the References. It’s not clear to me how many epidemiologists or public health officials took this into account in the earliest days of the coronavirus pandemic. Who knew? Did they try to tell us? Were we listening hard enough?

#7: I’m thinking of words from the Twilight Zone episode “Button, Button” that we hear more than once (with chilling implications the last time we hear it): “… someone whom you don’t know.” If there’s a more horrifying dramatization of the tragedy of the commons, I haven’t seen it.

**REFERENCES**

Tim Harford, “LaLa Land: Galileo’s Warning”, http://timharford.com/2019/11/cautionary-tales-ep-3-lala-land-galileos-warning/

Siddhartha Mukherjee, “How Does the Coronavirus Behave Inside a Patient?”, The New Yorker, March 26, 2020. Yes, I know this article is reportage, not science. If any of you can point me toward relevant medical literature, please do so in the Comments.

]]>“So let me get this straight, Mr. Propp: you plan to go to England to work with a mathematician who doesn’t even know you exist?”

It was 1982, I was a college senior applying for a fellowship that I hoped would send me to Cambridge University for a year, and the interviewer was voicing justified incredulity at my half-baked plan to collaborate with John Conway.

I’d read about Conway and his multifarious mathematical creations in Martin Gardner’s Mathematical Games column in Scientific American, and I’d become an ardent fan; I’d devoured his book “On Numbers and Games” and I’d even done some epigonic^{1} work on my own, trying to extend the theory of two-player games to allow for a third player. But I hadn’t even taken the step of writing to the man, and I had to sheepishly admit as much to the interviewer.

“You sound a bit like Luke Skywalker heading off to meet Yoda,” the interviewer said. His jest made me worry that I wouldn’t get the fellowship, but he must have believed in me more than his joke suggested. I was awarded a Knox Fellowship, and later that year I went to England on a Knox, as I liked to say (enjoying the resulting homophonic confusion).

**“DR. CONWAY WANTS TO TALK TO YOU”**

Conway was a celebrity among mathematicians but hadn’t risen to the top academic rank at Cambridge University. Perhaps that was partly due to his refusal to draw a line between the serious and the playful the way most mathematicians do. After he’d discovered three eminently respectable algebraic structures called the Conway groups^{2}, he’d resolved that from then on he would devote himself to whatever interested him regardless of what other people thought. This resolution showed itself clearly in his subsequent output. Conway’s most profound and distinctive contribution to mathematics, his theory of surreal numbers^{3}, was shot through with inspirations coming from the study of games, and the achievement he was best known for in the broader world was his invention of a kind of computer-aided solitaire called Conway’s Game of Life. And, unlike most mathematicians, Conway didn’t confine his research to one particular area; his breadth of interests would have smacked of dilettantism if he hadn’t made fundamental discoveries in the topics he turned his attention to. It was hard for more traditional academics to know what to make of him. When I arrived at Cambridge University, Conway was a Lecturer, not a Professor.

Later on, I realized I’d been lucky that Conway wasn’t off taking a sabbatical somewhere (perhaps in the U.S.) during the year I’d left the U.S. to work with him in England!

I found Conway in the Trinity College Common Room one day. He was (as he would remain throughout his life) happy to have a conversation with a stranger. I introduced myself and told him about the work I’d been doing on three-player games. I was looking at what are called *impartial* games, of which the prototypical example is Bouton’s game of Nim. In Nim, the “board” consists of one or more heaps of counters, and a legal move for a player is to take away as many counters as the player wants from a single pile. The player who makes the last move wins. What makes the game “impartial” is that every move that is available to one player is available to the other. There’s a beautiful mathematical theory of how to win impartial two-player games, and I told Conway that I wanted to extend it to three players who take turns in cyclical fashion.

If we’re watching a three-player impartial game in progress, and we freeze the action, there are four possibilities. First, the player who’s about to move (call her Natalie) could have such a strong position that, if she keeps her wits about her, she can guarantee that she’ll make the last move and win the game, regardless of what her two adversaries do. Second, the player who gets to move after Natalie (call him Oliver) might have a strategy that lets *him* win, no matter what. Third, the player who gets to move after Oliver (call him Percival) might have a winning strategy against the other two. Lastly, it’s possible that *no* player has a winning strategy — that any two players have the ability to defeat the third (leaving aside the issue of how coalitions might form or dissolve, or how agreements to share a prize might be enforced). I called these four cases N, O, P, and Q. Much of my preparatory work on the problem of classifying positions in three-player games could be summarized in the following table, whose simplicity hides the amount of work required to justify it:

For instance, the upper left entry signifies that if we have two positions, each of which is of type P, and we smoosh them together, then the resulting position is either of type P or of type Q.

I showed Conway the table and described what I was hoping to do next. What I didn’t tell him — what I didn’t have the guts to tell him — is that I was hoping to do the work with him. I mean, who was I, not even a proper graduate student, to suggest that it would be worth Conway’s time to collaborate with me? John expressed approval of my research plan, wished me luck, and encouraged me to let him know how things went. And that, I assumed, was that.

A few days later, one of my flatmates told me that while I’d been off attending classes there’d been a telephone call for me from some secretary at DPMMS (the Department of Pure Mathematics and Mathematical Statistics). When I called her back, she said “Dr. Conway wants to talk to you,” but she wouldn’t tell me more, except to say that he had seemed upset. She arranged a place and time for Conway and me to meet, and when I met him, he began by apologizing.

“This is your project, not mine,” he said, “but I couldn’t stop thinking about it, and I even did some work on it. I’m sorry. I don’t usually do things like this.”

I reassured him that I’d hoped all along that we’d work on three-player games together. He was relieved, because he said he didn’t think it was right for someone to poach someone else’s research project (especially that of a younger person).

As it turned out, guilt wasn’t the only thing bothering John; he was also frustrated mathematically. “I was able to prove all the claims in your addition table,” he said, “but I couldn’t figure out how to prove that the sum of two type-O positions can’t be another type-O position.”

“Oh yes, that’s the hard one,” I said, and I showed him how the proof went.^{4} And so my collaboration with John Conway was launched.

I hasten to say that the proof I showed John was totally in the style that I’d learned from his book, and that it didn’t use any tricks he didn’t already know. It just took more work. Consider that I’d had months to work on the theory; John had only learned about it from me a few days before. I had no doubt then (and have no doubt now) that if John had set aside a few hours for the task, as I had done earlier that year, he would’ve found the proof. But I won’t deny that it was a huge boost to my ego to know that I’d proved something about games that had stumped him the first time he tried to prove it.

**WITH CLIPBOARD AND BABY**

Usually we would meet at a local coffee shop called Fitzbillies. John was the scribe, filling page after page with calculations. One time, shortly after the birth of his son Oliver, we worked at his home, where I got to meet his then-wife, the mathematician Larissa (“Lara”) Queen. We entered his home through the back door, set in a featureless wall that faced a parking lot; it reminded me a bit of Bag End from “The Hobbit”. Sometimes he’d bring little Oliver to the cafe with us, and John would somehow balance the clipboard and the baby and do math while keeping his young son happy.

I wish that I’d saved some of those pieces of paper, or that I even remembered some of what was on them. The phrase “tribal markings” has stayed with me, as the name John gave a system for discriminating between positions based on how they behaved when you added *k* Nim heaps of size 1 to them, for *k* = 1, 2, 3, … . Ultimately, what sank the enterprise (or at least my enthusiasm for it) was that John’s extension of my theory didn’t seem to apply to any actual games in an interesting way. In nearly all positions in nearly all three-player impartial games, any two players can gang up on the third if they make a plan and stick to it. Years later, it occurred to me that John and I should have taken a break from delving into 3-player impartial games and taken a look at 4-player impartial games. Even though in most such games any three players can gang up on the fourth, one can look at how one two-player alliance fares against another, and we might have found something worth publishing. In the end, many years later, I did publish an article on three-player games, but it didn’t include any of the work I’d done with John in Cambridge.

I also attended a class John taught on Games, Groups, Lattices, and Loops, and while I didn’t warm to most of the topics he covered (perhaps because I didn’t put in any time playing with them on my own), I was struck by the way his ideas about games turned out to play a role in his work on lattices, codes, and packings, with connections that became even clearer in the decade that followed (see his article with Sloane listed in the References). What are the chances that a mathematician who loved games would have the luck to find that games secretly underlie other subjects he studies? It almost seemed as if mathematical reality was bending itself to his will — that he had “root access” to the Platonic realm of pure form.

Of course, I don’t really believe that. The likeliest explanation is that what attracted John to work in these areas was some subtle affinity between them — an affinity that reflected some hidden mathematical substructure they had in common. Which problems seize a mathematician’s fancy, and which ones leave a mathematician’s soul unstirred? These things are as mysterious as physical attraction, but just as some people have a physical type they’re attracted to, I think John had a “type” in the mathematical domain, so that even though his interests were broad, there is something “Conway-ish” about the problems he tackled, and he had a keen sense of smell when it came to scenting out Conway-ish problems.

**PRINCETON AND ELSEWHERE**

Not many years after I came to visit him in England, John moved to the U.S. and became a professor at Princeton. He traveled a lot, giving talks at conferences across the country, participating in research retreats, and even spending time at mathematical summer camps for high schoolers and middle schoolers. I went to many of Conway’s talks, some of them rather outrageous (I remember the time when, lacking a damp paper towel, he licked an overhead slide clean so that he could re-use it), but the thing that I found most striking is that he never gave the same talk twice. For him, giving a talk was an improvisatory performance, an extension of his love of one-on-one conversation.

I remember a time when I was hoping to snag Conway’s interest in a problem that struck me as Conway-ish, hoping to recreate the sort of collaboration we’d had in 1982. I drew a triangle with some lines cutting through it, something like this,

and then I asked Conway if he could add more lines so that all the small regions cut out by the lines were triangles. He thought a bit, and drew some lines:

I nodded, and then said “Do you think that if I draw *any* finite number of lines passing through a triangle, there’s always a way for you to to add more lines so that *all* the small regions cut out by the lines are triangles?” He said “Let me think about that,” and as the curiosity-bug bit into his brain, he began to draw pictures, make observations, and formulate conjectures. But he was no fool; he could see that I was deliberately trying to entice him into working on the problem, and he was too proud to want to be seen as one who is so easily seduced. He shuddered as if shaking off an unpleasant memory and said “You know, I don’t have to work on just ANY damned problem!”

The problem is still unsolved, as far as I know. (For more info, see the Math Forum webpage listed in the References.)

**ATLANTA**

During the past twenty years, most of my conversations with Conway took place in Atlanta, at a meeting of math-y, magic-y people held every two years called the Gathering for Gardner. (Actually the “for” is officially supposed to be rendered as the number “4”, but I find the cutesiness a bit too much.) I never had a chance to talk to Gardner himself at one of these Gatherings; he stopped attending in the late 90s, partly because he wanted to be with his wife who didn’t like travel, but mostly because he hated adulation.^{5}

John, on the other hand, never minded being the center of attention, and rarely missed a Gathering. He loved to talk, and people loved to listen. He, however, didn’t always like to listen (especially as he grew older), and he seldom went to the formal talks held in the ballroom, preferring to linger in the anteroom and converse with whoever was interested in talking to him. This became a problem for me, because I didn’t want John to talk to just anyone — I wanted him to talk to *me*: about frieze patterns, boundary invariants for tilings, sphere packings, surcomplex numbers, group theory, knot theory, etc. John, however, was just as happy to perform magic tricks for strangers as he was to discuss our shared mathematical interests. Or perhaps he sometimes found my company dull, and was happy for the relief provided by other interlocutors eager to chat with him on other topics?

One of the last times I saw Conway was at a Gathering for Gardner in Atlanta in 2014. This Gathering was held specifically in John’s honor, and I gave a talk there on his indirect contribution to the theory of random tilings. Characteristically, he wasn’t at the talk; he preferred one-on-one conversation to attending presentations. If you were there that year, you might at one point have spied me sitting in the anteroom on the floor at Conway’s feet, and you might have thought it looked odd. Why had I adopted this undignified position? Because I had Conway’s ear, I needed to sit, there was no other chair, and I feared that if I left to get a chair, someone else would snag his attention and I’d never be able to finish the conversation.

It’s only recently occurred to me that, to the extent that Conway in later life became a less considerate person, the attention of fans like myself may have played a facilitating role. One reason people behave as well as they do is that bad behavior comes at a social price. If you’re an ordinary person, spending most of your time in a particular place, hanging out with a limited supply of people, and you’re rude to enough people for a long enough time, you’ll eventually run out of people who seek your company. But when you’re a star the way Conway was, there’s always another eager fan to shower you with attention, no matter how many people you’ve alienated. John was never unkind to me (the rudest thing he ever said to me, after I made some intelligent comment, was “You’re not as dumb as you look” and I think he meant it affectionately), but I’ve heard from a few others (women, I’m sorry to say) to whom he was not so polite, or to whom he displayed a creepy kind of attentiveness. I’ve written elsewhere about geniuses, but it now seems to me that a deeper problem has to do with how communities choose heroes, and how communities treat those heroes. I feel torn between two uncomfortably clashing beliefs: that heroes are necessary or at least inevitable, and that hero-worship damages the souls of the worshipper and the worshipped. Maybe some readers of this essay have thought more deeply about this than I have and will have useful insights.

**THE END**

Conway succumbed to COVID-19 in April 2020. I’m glad that while he was still alive I let him know how big a role he played in my life (something I neglected to do in the case of Martin Gardner). I think it’s fair to say that he was the Beatles of mathematics, not just because he was from Liverpool, but because so much of John’s work is so damned catchy. Just as many Lennon-McCartney songs have a memorable “hook”, many of John’s best creations have a way of sticking in the mind once you understand them, in a way that most mathematical discoveries don’t. A layperson with an interest in mathematics can get a surface appreciation of John’s work in a way that just isn’t possible for 99 percent of contemporary mathematical research. Here’s one example of an especially accessible Conway theorem (an isolated aperçu as opposed to a piece of a bigger story): If you extend the sides of triangle *ABC* as shown, to points *A** _{b}*,

It’s hard to believe that as simple a geometric proposition as Conway’s circle theorem could have lain undiscovered for more than a score of centuries, but it did. (For a beautiful proof-without-words of this proposition, see the proof by Colin Beveridge listed in the references.)

It’s easy for songwriters to feel that all the best tunes, chord progressions, and hooks have already been used by the songwriters who came before. Likewise, if you’re a pure mathematician whose job is to create new games of pure thought, it’s easy to feel that all the beautiful simple ideas have already been thought of — that our forebears have already turned the mathematical topsoil, leaving us the more arduous task of cutting through rock in search of undiscovered gems. The main thing I learned from John is that even if the supply of beautiful yet simple mathematical truths is in some sense finite, we’re nowhere near the bottom of it. Conway’s career is an existence proof that a career like his is possible, or at least was possible up through the year 2020. I hope and believe that at least throughout my lifetime there’ll still be plenty of scope for mathematicians of his temperament to find new thought-games that somehow manage to be compellingly simple yet enduringly deep.

*Thanks to Tibor Beke, Nancy Blachman, Sandi Gubin, David Jacobi, Joe Malkevich, Evan Romer, and Shecky Riemann.*

Next month: The Mathematics of Irony.

**REFERENCES**

BAAM! (Bay Area Artists and Mathematicians) and G4G (Gathering 4 Gardner), Remembering John Conway, https://youtu.be/Ru9fX3VPR9Y

Matt Baker, “Some mathematical gems from John Conway”, https://mattbaker.blog/2020/04/15/some-mathematical-gems-from-john-conway/ .

Colin Beveridge, Conway’s circle, a proof without words, https://aperiodical.com/2020/05/the-big-lock-down-math-off-match-14/ .

John Conway, On Numbers and Games.

John Conway and Neil Sloane, “Lexicographic Codes: Error-Correcting Codes from Game Theory”, IEEE Transactions on Information Theory, Vol. IT-32, No. 3, May 1986; available at http://neilsloane.com/doc/Me122.pdf .

Donald Knuth, Surreal Numbers: How Two Ex-Students Turned on to Pure Mathematics and Found Total Happiness, 1974.

MathOverflow, Conway’s lesser-known results, https://mathoverflow.net/questions/357197/conways-lesser-known-results .

James Propp, “Three-player impartial games”, Theoretical Computer Science 233 (2000), pp. 263–278; available from https://arxiv.org/abs/math/9903153.

Jim Propp, “Conway’s impact on the theory of random tilings”, talk presented at G4G11 in 2014; vodeo at https://www.youtube.com/watch?v=e_729Ehb4vQ .

Siobhan Roberts, Genius at Play: The Curious Mind of John Horton Conway.

Siobhan Roberts, “Travels with John Conway, in 258 Septillion Dimensions,” New York Times, May 16, 2020; at https://www.nytimes.com/2020/05/16/science/john-conway-math.html .

Stan Wagon, Math Forum Problem-of-the-Week 812, “A Pre-Sliced Triangle”; at http://mathforum.org/wagon/spring96/p812.html .

**ENDNOTES**

#1. The term “epigone” is usually an insult; who wants to be called “second-rate”? But if the tier extends beyond second-rate to third-rate, fourth-rate, etc., being second-rate isn’t so bad! And it’s no disgrace to be deemed not-as-good-as-Conway.

#2. The Conway groups are examples of algebraic structures called finite simple groups. There are several infinite families of finite simple groups and then twenty-six “bonus” finite simple groups that we call sporadic, including the three Conway found. The largest of the sporadic finite simple groups is called the Monster, and near the end of his life Conway confided that, although it was his fondest wish to understand why the Monster existed, he doubted that he would live that long.

#3. The surreal number system is an extension of the ordinary real number system that includes infinite and infinitesimal quantities as well as familiar numbers like the seventeen and the square root of two. The term “surreal numbers” was coined by Donald Knuth, whose book on the subject occupied me for many happy hours when I was in high school.

#4. In my article “Three-player impartial games”, the proposition that gave Conway trouble appears as Claim 7, and it hinges on five of the six preceding claims.

#5. I would sometimes bring students to attend the Gathering with me. One student’s father had initial misgivings about having his daughter attend some sort of strange convocation held in honor of a man who didn’t even show up. I’m guessing it reminded him of those ashrams in the U.S. that are dedicated to the teachings of a guru back in India. To him, Gardner-fandom seemed like a bit of a cult. And I can’t say he’s completely wrong.

]]>This sort of invocation of chemistry as a magic history-spanning bridge can be traced back to James Jeans, the English scientist and mathematician, who in his 1940 book “The Kinetic Theory of Gases” wrote: “If we assume that the last breath of, say, Julius Caesar has by now become thoroughly scattered through the atmosphere, then the chances are that each of us inhales one molecule of it with every breath we take.” The science writer Sam Kean recently wrote an entire book, “Caesar’s Last Breath”, that takes this proposition as its starting point.

In between Jeans and Kean, other writers making the same point have replaced Caesar by Archimedes or Jesus or da Vinci. I prefer Archimedes, because he was the first of the ancient Greek mathematicians to come to grips with really big numbers and to connect the macroscopic and microscopic realms; in “The Sand Reckoner” he calculated how many grains of sand would fill the universe as the Greeks understood it.

As I write this essay in April 2020, human society has been violently tipped on its side, and the eight billion or so people who share this planet have come to realize how small the world has become epidemiologically. We’ve also become fearfully conscious of the contents of the air we bring into our bodies. Perhaps now is a good time to take a deep and hopefully healthy breath and think a bit about how the content of our lungs connects us to people far away in space and time, situated in a past that, even at a remove of a few months, feels very distant.

Molecules are tiny; Earth is huge; we’re somewhere in between. Our brains didn’t evolve to handle the difference in scale between microscopic events and events of daily life, or between events of daily life and global processes. We can fling around words and phrases like “nanotech” and “trillion dollar deficit”, but few of us really *get*, on a gut level, how small a nanometer is or how big a trillion is.

And yet, neuroanatomical evolution has already developed a wonderful approach to the problem of scale. Consider for instance the human ear; it must process a gamut of frequencies from 20 to 20,000 cycles per second. It does this using an organ called the cochlea, whose thousands of tiny hair cells respond to different frequencies. When the ear picks up a tone, the position of the hair-cell along the cochlea that responds to that particular tone corresponds roughly to the logarithm of that tone’s frequency — which may sound intimidating if you’re rusty with logarithms, but if you’ve ever played a piano you have an intuitive, kinesthetic sense for the logarithms of frequencies. Each time you shift your hand up an octave, you double the frequency of the note. Frequency is an exponential function of the position of your hand, and conversely, the position of your hand is the logarithm of the frequency produced. The cochlea is just like that, except that it’s receiving sound, not producing it. Some have called the cochlea an “inverse piano” to highlight this analogy.^{1}

We can use exponentials and logarithms to try to get a handle on the large and small, but it’s easy to forget the key difference between counting “one, two, three, four, …” and counting “thousand, million, billion, trillion, …”: the former is an *arithmetic* progression (each term is equal to the previous term *plus* something, namely 1) while the latter is a *geometric* progression (each term is equal to the previous term *times* something, namely 1000). In more concrete terms: If we plot the four numbers one, two, three, and four on a number line, we get this:

On the other hand, if we plot the four numbers one thousand, one million, one billion, and one trillion on a number line, we get this:

Were you expecting to see four dots? Well, the “dot” at the left is actually two dots, one for one thousand and one for one million; at this scale the two dots are too close together to be distinguished. Meanwhile, the dot for one trillion is about a mile off to the right.

If you’ve never seen the video “Powers of Ten” or the similar video “Cosmic Zoom”, I suggest you take a break from reading this essay and watch one or both of them. Touring the universe from the largest scales we know about to the smallest is a great way to get a feeling for how the different levels of our universe fit together. The largest structures we know about are roughly 10^{41} times larger than the smallest. We’re somewhere in between, on a cosmic piano that has roughly a hundred and forty octaves.^{2}

When one does calculations that involve big things, small things, and things that are in between, one sometimes finds that the in-between things are close to the midpoint on a logarithmic scale, in the way that middle C is close to the midpoint of a piano keyboard. One example of this phenomenon is the proposition that there are about as many molecules in a teaspoonful of water as there are teaspoonfuls of water in all Earth’s oceans (about 200 sextillion in both cases). A more mind-boggling example is one I learned from Bill Gosper, who computed that a molecule of polyethylene^{3} spanning the observable Universe, suitably folded, would just about fit in NASA’s Hangar 1, one of the largest buildings ever constructed. Another phenomenon along similar lines is the way you can use an oil drop to measure the wavelength of visible light; the drop is much larger than the wavelength of light, but ponds are much larger than droplets, so when you let a droplet spread evenly over the surface of a pond, you can create a layer of oil so thin that the resulting interference patterns let you determine the wavelength of the light.^{4}

The claim about Caesar’s last breath is yet another a story about three length-scales, spanning the logarithmic ladder from molecules to people to planets. How big are these things? The diameter of the planet is about eight million times the height of the average human adult, which in turn is about five billion times the diameter of a molecule of air. With such disparate ratios (eight million versus five billion) it might seem that in the range from single molecule to entire atmosphere we humans are off-center, logarithmically speaking, but that’s because we’re ignoring two important things: gas kinetics and the shape of the atmosphere. Molecules in a gas aren’t packed like oranges at the grocer’s; they’re constantly jostling one another, in an all-against-all molecular melee that results in far fewer molecules per liter than the size of a molecule would suggest. Also, our atmosphere is not a ball of gas but a *hollow* ball, eight thousand miles across from its northernmost point to its southernmost but only ten miles thin; in relative terms, that’s five time thinner than the shell of a chicken’s egg. When you do the math (as Archimedes would have loved to do, given his famous work on the volume and surface area of spheres), you find that the number of molecules of air in a lung is quite close to the number of lungfuls of air on Earth. And this suggests that the number of molecules from Caesar’s last breath in your lungs right now is approximately 1.

The answer is sufficiently close to 1 that it’s probably sensitive to issues ignored in our oversimplified mixing model.^{5}

Jeans’ claim ignores the massive amount of molecular recombination going on in our atmosphere. In the chemical dance of geological and biological processes, oxygen and nitrogen atoms (the primary constituents of air) change partners all the time. It’s conceivable that most of the oxygen molecules in Caesar’s last breath got split long ago, and hence, strictly speaking, no longer exist. Of course, we could rescue Jeans’ claim by replacing molecules by atoms, and then similar calculations would apply.

I myself prefer to go back to the formulation that I read as a child, the ones that talks about Archimedes’ lifetime pulmonary output instead of his dying breath; aside from the fact that it’s less morbid, it’s also much more likely to be true. That extra factor of half a billion, coming from all those breaths, makes the proposition much more certain, even if some of those breaths contained molecular “repeats”, and even if some of the molecules escaped into outer space, or sit sequestered in permafrost, or were cleft by lightning or metabolism.

I suggest Terry Tao’s lecture “The Cosmic Distance Ladder” as a follow-up to “Powers of Ten” and “Cosmic Zoom”. But such pedagogical tools can only go so far to give us a feeling for the power of raising things to powers. An old story from India tells how a grand vizier, having invented the game of chess for the ruler’s enjoyment, asks that his reward be one rice grain for the first square of the board, two grains for the second, four grains for the next, eight grains for the next, and so on, up until the 64th and last square of the board. The king thinks the vizier is letting him off easy and agrees to his terms. Only later, when he starts trying to pay the rice to the vizier, does he discover that he’s made a mistake. How much rice was the vizier asking for? 1+2+4+8+…+2^{63} comes to about 18 quintillion grains of rice, which is a thousand times greater than the amount of rice that is currently grown in the world in a year. In one version of the story, the king, upon realizing that all the rice in his kingdom wouldn’t suffice, nullifies his promise by having the vizier executed.

Even when you think you understand exponential growth, it’s easy to slip up. Here’s an example from The Giant Golden Book of Mathematics, a book I loved as a child and still admire: “An amoeba is placed in an empty jar. After one second, the amoeba splits into two amoebas, each as big as the mother amoeba. After another second, the daughter amoebas split in the same way. As each new generation splits, the number of amoebas and their total bulk doubles each second. In one hour the jar is full. When is it half-full?” It’s tempting to answer “half an hour”, but the correct answer is one second before the hour is up. Actually, an even better answer is “That’s a ridiculous question.” There are 3600 seconds in an hour, and 3600 rounds of doubling would lead to a growth of the initial biomass by a factor of about 10^{1000}. There aren’t enough octaves on the cosmic piano for that. Before the hour is up, the amoebas would fill all of the the known universe.

Those imaginary amoebas teach us something that we forget at our peril: exponentially growing quantities look negligible until they don’t — or look innocuous until it’s too late to do anything about them. Why worry if ten grains of rice become twenty? It’s still less than a handful. Why worry if ten cases of a communicable disease become twenty? More people die each year from falling out of bed.^{6} It’s easy to dismiss things that are growing exponentially when they’re small. Albert Allen Bartlett famously wrote “The greatest shortcoming of the human race is our inability to understand the exponential function.” Let’s hope we as a species can avoid the grand vizier’s fate.

To end on an upbeat (or dare I say “inspiring”?) note, it’s worth remembering that the same solar energy that kindled Archimedes’ brain by way of chemical bonds in the oxygen he breathed also feeds *our* brains. If we focus enough brainpower on the problems we face as a species, it’s possible we’ll be able to come up with ways to cope with the current crisis and stumble our way through to the next crisis, and the next, and the next.

*Thanks to John Baez, BIll Gosper, Sandi Gubin, Hans Havermann, Michael Kleber, Henri Picciotto, Evan Romer, and Simon Plouffe.*

Next month: Confessions of a Conway Groupie.

**ENDNOTES**

#1. The central nervous system must have its own tricks for dealing with the problem of disparate scale; for instance, perceptible levels of loudness, from the barely discernible to the headache-inducing, span many orders of magnitude, as do perceptible levels of illumination. If you know something about how the brain encodes intensity of auditory and visual stimuli, please post in the comments!

#2. Thinking of this piano puts me in mind of a scene from “The 5000 Fingers of Dr. T.“, which as I child I found so disconcerting that I couldn’t watch the movie.

#3. Polyethylene is a chain of hydrogens attached to a carbon backbone of indefinite length, so in principle a polyethylene molecule could be long enough to span the observable universe; this hypothetical molecule, if folded up tightly, would fit inside Hangar 1.

#4. Can anyone provide a good reference for this?

#5. When we’re dealing with quantities much bigger than 1, or much smaller, an order of magnitude or two usually doesn’t have a qualitative effect on the conclusions we can draw, but that’s not the case when quantities are logarithmically close to 1, or 10^{0}. If the expected number of “special” molecules in our lungs at any given time is computed to be around 10^{−2} = .01, then we could say with confidence that most of the time our lungs don’t contain any. On the other hand, if the expected number of special molecules in our lungs at any given time is computed to be around 10^{2} = 100, and we model the number of such molecules in our lungs as a Poisson random variable, theory tells us that the standard deviation is 10, so that the probability that our lungs contain none at all right now — a “ten-sigma event” — is minuscule.

#6. Propagation of a novel disease through a vulnerable population is described pretty well by an exponential function in the early stages of the epidemic, when most of the population is immunologically naive. In later stages of the epidemic, the sigmoid curve predicted by the logistic model provides a better fit.

]]>Bella: You gotta give me some answers.

Edward: “Yes”; “No”; “To get to the other side”; “1.77245…”

Bella (interrupting): I don’t want to know what the square root of pi is.

Edward (surprised): You knew that?

— Twilight

March 14 (month 3, day 14 of each year) is the day math-nerds celebrate the number *π* (3.14…), and you might be one of them. But if you’re getting tired of your *π* served plain, why not spice things up by combining the world’s favorite nerdy number with the world’s favorite nerdy operation?

The square root of *π* has attracted attention for almost as long as *π* itself. When you’re an ancient Greek mathematician studying circles and squares and playing with straightedges and compasses, it’s natural to try to find a circle and a square that have the same area. If you start with the circle and try to find the square, that’s called squaring the circle. If your circle has radius *r*=1, then its area is *πr*^{2} = *π*, so a square with side-length *s* has the same area as your circle if *s*^{2 } = *π*, that is, if *s* = sqrt(*π*). It’s well-known that squaring the circle is impossible in the sense that, if you use the classic Greek tools in the classic Greek manner, you can’t construct a square whose side-length is sqrt(*π*) (even though you can approximate it as closely as you like); see David Richeson’s new book listed in the References for lots more details about this. But what’s less well-known is that there are (at least!) two other places in mathematics where the square root of *π* crops up: an infinite product that on its surface makes no sense, and a calculus problem that you can use a surface to solve.

**A NONSENSICAL FACTORIAL**

Factorials are a handy shortcut in many counting problems. If someone asks you how many different ways there are to order the 52 cards of a standard deck, the answer is “52 factorial” (meaning 52 × 51 × 50 × … × 3 × 2 × 1, often written as “52!”); this answer takes a lot less time to say than “eighty unvigintillion, six hundred and fifty-eight vigintillion, …, eight hundred and twenty-four trillion”. But how should we define *n*! if *n* is not a counting number? What if, say, *n* is negative one-half?

A practical-minded person might gibe that someone who wants to know how many ways there are to order a stack consisting of negative one-half cards isn’t playing with a full deck. But this kind of craziness is often surprisingly fruitful; the symbolisms mathematicians come up with sometimes take on a life of their own, and living things want to grow. Or, as the great mathematician Leonhard Euler wrote (unless he didn’t^{1}), “My pen is a better mathematician than I am,” meaning that notations sometimes precede understanding. By trying to extend the factorial function to numbers like −1/2 that aren’t counting numbers, Euler was led to invent^{2} the gamma function, which is important not just in pure mathematics but in applications.

Specifically, Euler showed^{3} that when *n* is a positive integer, *n*! is equal to the integral

For those of you who know coordinate geometry but don’t know calculus, this expression represents the area bounded by the vertical axis *x* = 0, the horizontal axis *y* = 0, and the graph of the curve *y* = *x*^{n}*e*^{−}^{x}. Here *e* = 2.718… is the other famous nerdy number of math, dominating the world of exponentials and logarithms the way *π* rules the world of sines and cosines. The cool thing about that expression is that it makes sense when you replace *n* by −1/2 (corresponding to the area under the curve *y* = *x*^{−1/2}*e*^{−}^{x } in the picture below), and gives the value sqrt(*π*).

What’s more, if you find other integrals that coincide with *n*! when *n* is a positive integer, they’ll give you sqrt(*π*) when you replace *n* by −1/2. Metaphorically you could say that even though −1/2 times −3/2 times −5/2 times … times 3 times 2 times 1 isn’t equal to any number, the number it’s *trying* to equal is sqrt(*π*).

**A NONSENSICAL SUM**

Let’s consider a simpler example of an expression that’s “trying to equal” something. If *r* is a positive number that’s less than 1, then the infinite sum 1−*r*+*r*^{2}−*r*^{3}+… (an alternating geometric series) converges to 1/(1+*r*) in the sense that the partial sums 1, 1−*r*, 1−*r*+*r*^{2}, 1−*r*+*r*^{2}−*r*^{3},… get ever-closer to 1/(1+*r*) as the number of terms gets ever-bigger.^{4} Now, if *r* is 1, then 1/(1+*r*) makes sense and equals 1/2, but the alternating sum 1−1+1−1+… does not get ever-closer to 1/2 or to any other number. The partial sums 1, 1−1, 1−1+1, 1−1+1−1, … just vacillate between two values instead of converging: 1, 0, 1, 0, …

But what if we change the definition of convergence? Mathematician Ernesto Cesàro proposed a more permissive definition in which we take a sequence of numbers that annoyingly refuses to converge and replace it by a new, more tractable sequence whose *n*th term is the *average* of the first *n* terms of the old sequence.^{5}

If we apply Cesàro’s procedure to the sequence 1,0,1,0,… we get 1/1, (1+0)/2, (1+0+1)/3, (1+0+1+0)/4, … which sure enough converges to 1/2. And remember, 1/2 is exactly what we got from 1/(1+*r*) when we replaced *r* by 1.

So I’ve shown you two different shady methods of assigning a value to the non-converging sum 1−1+1−1+… . One method considers the more general sum 1−*r*+*r*^{2}−*r*^{3}+…, finds an expression for it that’s valid for all positive *r*<1, and then substitutes *r*=1; the method is called Abel summation. The other method replaces the ordinary notion of summation taught in calculus classes by Cesàro summation. And here’s an even simpler shady proof: Let *x* = 1 − 1 + 1 − 1 + … . Then

1 − *x* = 1 − (1 − 1 + 1 − 1 + …) = 1 − 1 + 1 − 1 + 1 − … = *x*,

so 1 − *x* = *x*, which yields *x* = 1/2.

What’s interesting is that even though these shady methods are shady in different ways, they all give the same value. This phenomenon − different shady methods arriving at the same answer − crops up a lot at the frontiers of math, and it often points to mathematical concepts that have not yet come into view. When we finally climb the hill that hid those concepts from view, we find that there’s nothing illegitimate about those concepts; they’re just more mathematically technical than, and not as attention-grabbing as, a formula like “1−1+1−1+…=1/2”. Such a formula, presented out of context, with all the sense-making scaffolding yanked away, is sort of like Lewis Carroll’s Cheshire cat: all that’s left is a (possibly infuriating) smirk.

An infamous example of mathematical YouTube sensationalism is the formula 1+2+3+… = −1/12 (and its less-celebrated sibling 1+1+1+… = −1/2). When equations like these are hyped outside of their proper context, math-fans sometimes respond with anger: “Those bastards are changing the rules again!” What often isn’t explained (because it’s fairly technical) is what the new rules are. What can and should be explained is that this rules-change is analogous to what you’d see if you were an American watching American football on TV and then changed the channel to watch Australian football. Well-educated sports fans don’t fume “They’re playing it wrong! Why, even my *gym teacher* knows more about football than these players do!”; they recognize that the Australians are playing a different sport. Same here, except that the name of the sport is “zeta-function regularization”^{6}, it’s played with numbers, and the players are analytic number theorists, only some of whom are Australian.

**LORD KELVIN’S DEFINITION OF A MATHEMATICIAN**

The physicist William Thomson, better known as Lord Kelvin, was a big fan of mathematics, calling it “the etherealization of common sense” and “the only good metaphysics”. According to an anecdote recounted by his biographer S. P. Thompson, Kelvin was a bit of an awestruck fanboy when it came to mathematicians themselves:

*Once when lecturing to a class he [Lord Kelvin] used the word “mathematician,” and then interrupting himself asked his class: “Do you know what a mathematician is?” Stepping to the blackboard he wrote upon it: *

*Then putting his finger on what he had written, he turned to his class and said: “A mathematician is one to whom that is as obvious as that twice two makes four is to you.”*

This formula (usually attributed to Gauss) is important and fairly well known, but I don’t think I know a single mathematician who regards the formula as obvious, so I don’t think Kelvin’s definition of a mathematician is a good one. Here’s an alternative definition for your consideration: A mathematician is one who recognizes the difference between what is obvious and what is merely familiar. Or: A mathematician is one who recognizes the difference between what is obvious and what one has come to understand in stages, by means of a nontrivial chain of trivial steps.

This seems like an ungrateful way to treat a scientist who, when all is said and done, was just trying to praise my kind. But a mathematician is one to whom flattery is annoying if it is inaccurate. (See, I can overgeneralize too!)

The expression

represents the area under the curve *y* = *e*^{−}^{x}^{2}.

The ∫ notation, which we saw earlier, is due to Leibniz, who chose a stylized version of the letter “*s*” to commemorate the fact that the way we can compute such areas is by first approximating them as __s__ums, and by then atoning for the error of the approximation by using ever-better approximations and seeing what the approximations converge to.

**SOLVING A PROBLEM BY MAKING IT SIMPLER**

Let’s talk about sums for a minute. Specifically, consider the problem of adding together all the numbers in the first four rows and first four columns of the multiplication table:

1 2 3 4

2 4 6 8

3 6 9 12

4 8 12 16

A trick for computing the sum is to consider that the product (1 + 2 + 3 + 4) × (1 + 2 + 3 + 4), if expanded out by the general distributive law, would have the sixteen numbers 1×1, 1×2, …, 4×3, and 4×4 as its constituent terms, so that the sum of the sixteen numbers must be 10 × 10, or 100. (“A mathematician is someone who works hard at being lazy,” said Murray Klamkin, riffing on George Pólya.)

You might want to apply the same idea to sum the numbers in the infinite array

1 1/2 1/4 1/8 …

1/2 1/4 1/8 1/16 …

1/4 1/8 1/16 1/32 …

1/8 1/16 1/32 1/64 …

…

(see Endnote #7).

This way of summing the numbers in an array is a cute trick, and it’s the sort of trick you see quite often in mathematics, where you reduce a two-dimensional problem to a one-dimensional problem, or more broadly solve a problem by reducing it to something that looks easier. But how often do you solve a problem by “reducing” it to something that looks harder? That counterintuitive tactic provides the nicest way I know to prove the famous formula of Gauss that Lord Kelvin quoted.

**SOLVING A PROBLEM BY MAKING IT MORE COMPLICATED**

Let *A* be the area between the *x*-axis and the curve *y* = *e*^{−}^{x}^{2}, represented by Gauss’ integral. Then multivariate calculus can be used to show that *A*^{2} is the volume between the *x*,*y*-plane and the surface *z* = *e*^{−}^{x}^{2}*e*^{−}^{y}^{2}.

Using the laws of exponents, we can rewrite that right-hand side as *e*^{−(}^{x}^{2}^{ +}^{y}^{2}^{)}. This crucial step reveals that the surface has a surprising symmetry: it’s rotationally symmetric around the *z*-axis.^{8} That fact enables calculus students to treat the space between the *x*,*y*-plane and the surface *z* = *e*^{−(}^{x}^{2}^{ +}^{y}^{2}^{)} as a solid of revolution, and to compute its volume using the method of cylindrical shells. One gets *A*^{2} = *π*, from which it follows that *A* = sqrt(*π*): the formula on Lord Kelvin’s blackboard. (For more details, see John Cook’s writeup.)

I would modify Kelvin’s adage to say that a mathematician is someone who, having learned the preceding derivation, may not be able to remember the details, but remembers that circles play a role, and that the value of the integral therefore involves *π*. Or perhaps a mathematician is someone who finds the proof beautiful, irrespective of the beauty of the formula itself.

Gauss’ formula isn’t just a mathematical curiosity: the expression *e*^{−}^{x}^{2} is closely related to the famous bell-shaped curve of statistics, and the fact that the area beneath it is sqrt(*π*) explains where the *π *in basic statistical formulas comes from.

To go back to the classic problem with which this essay began: You can’t *square* the *circle*, but you can do something much more important, namely, you can prove

by interpreting the *square* of the left hand side as a volume and then computing that volume using *circles*.

To give Lord Kelvin his due, let me credit him for praising mathematicians for conceptual understanding, even if his praise strikes me as excessive. His adage is a lot better than something along the lines of “A mathematician is someone who knows the first half-dozen digits of the square root of *π*.”

*Thanks to Henry Baker, Michael Collins, David Feldman, Sandi Gubin, David Jacobi, Kerry Mitchell, Cris Moore, Ben Orlin, Evan Romer, James Tanton, and Glen Whitney.*

Next month: Air from Archimedes.

**ENDNOTES**

#1. Can anyone provide a source, and/or the original non-English version of the aphorism?

#2. Euler defined Γ(*z*) as

which leads to a slightly annoying off-by-one issue: instead of Γ(*n*) = *n*! we have Γ(*n*) = (*n*−1)!. Another issue worth mentioning is that Euler’s definition only works for positive values of *z*; if *z* ≤ 0 the area under the curve becomes infinite. Fortunately a technique called analytic continuation can be used to figure out what expressions like Γ(−1/2) are “trying” to equal, but discussing that would take me too far from my theme.

Actually, it’s not too hard to guess what we might want Γ(−1/2) to equal: the formula *n*! = *n* (*n*−1)! corresponds to the formula Γ(*z*+1) = *z* Γ(*z*), which (with *z* = −1/2) tells us that Γ(−1/2) should be Γ(1/2)/(−1/2) = sqrt(*π*)/(−1/2) = −2 sqrt(*π*).

Here’s a plot of the gamma function (notice that it is undefined when *z* is zero or a negative integer):

The values of Γ(*z*) for *z* = 1/2, 3/2, etc. play a role in the general formula for the volume of an *n*-dimensional ball of radius *R*. This volume is given by

You might enjoy using the formula Γ(*z*+1) = *z* Γ(*z*) to compute Γ(3/2) and Γ(5/2) (given that Γ(1/2) = sqrt(pi)) and then use the values Γ(1) = 1, Γ(3/2) = sqrt(*π*) / 2, Γ(2) = 1, and Γ(5/2) = (3 sqrt(*π*)) / 4 to compute the volume of an *n*-dimensional ball for *n* = 0, 1, 2, and 3. Curiously, plugging in *n* = −1 tells us that the “volume of a (−1)-dimensional ball” of radius *R* is trying to equal 1/*πR*, but I have no idea what this might mean!

#3. Some facts about the gamma function are hard to prove, but one that’s doable by first-year calculus methods is the formula Γ(1) = 1 (using improper integrals). It’s only a little bit harder to prove the formula Γ(*z*+1) = *z* Γ(*z*) using integration by parts. With these two formulas, we can go on to prove

Γ(2) = 1 Γ(1) = 1,

Γ(3) = 2 Γ(2) = 2 × 1,

Γ(4) = 3 Γ(3) = 3 × 2 × 1,

etc. This shows that Γ(*n*) is indeed equal to (*n*−1)!.

#4. When *r* is between 0 and 1, the partial sum 1−*r*+*r*^{2}−…±*r** ^{n}* times 1+

#5. Cesaro’s definition is a conservative extension of the original limit concept. If you have a sequence that converges to some limit, then its “Cesàro-ization” converges to the same limit. The value of Cesàro’s method lies in the fact that that Cesàro-ization of a divergent (i.e., non-convergent) sequence is sometimes convergent. But the averaging trick isn’t a panacea; lots of divergent sequences remain divergent after you take those running averages. For instance, the series 1 − 2 + 4 − 8 + … converges to 1/3 by Abel summation, but no matter how many times you applying Cesàro’s trick to the sequence of partials sums, you get a divergent sequence. And neither Cesàro’s trick nor Abel’s lets you assign a meaningful finite number to a sum like 1+1+1+… or 1+2+3+…

#6. The black magic of zeta-regularization can also be used to show that “infinity factorial” (i.e., the product of all the positive integers) is trying to equal sqrt(2*π*) and that the product of all the primes is trying to equal 4*π*^{2}. See the article by E. Muñoz García and R. Pérez Marco listed in the References.

#7. The sum arises as the expansion of 1+1/2+1/4+1/8+… times 1+1/2+1/4+1/8+…, and since 1+1/2+1/4+1/8+… equals 2 (in the sense that the partial sums converge to 2), the sum of all those numbers equals 2 times 2, and as Lord Kelvin said, “twice two makes four”. On the other hand, we can take that table of numbers and group it by diagonals to get 1×1 + 2×1/2 + 3×1/4 + 4×1/8 + … So we’ve shown that the infinite sum 1/1 + 2/2 + 3/4 + 4/8 + 5/16 + … (where the numerators increase by adding 1 and the denominators increase by doubling) converges to 4. A mathematician is someone who thinks this is cute. Not “obvious” — just cute.

#8. Suppose the points (*x*,*y*) and (*x*‘,*y*‘) are at equal distance from the origin; call that distance *r*. Then we have *x*^{2}+*y*^{2} = *r*^{2} = (*x’*)^{2}+(*y’*)^{2}, so *e*^{−(}^{x}^{2}^{ +}^{y}^{2}^{)} equals *e*^{−((}^{x’}^{)}^{2}^{ +(}^{y’}^{)}^{2}^{)}, which tells us that the three-dimensional graph of the function *e*^{−(}^{x}^{2}^{ +}^{y}^{2}^{)} has rotational symmetry. One thing I love about the proof of Gauss’ formula is the way it brings together so much mathematics: *e*, *π*, and even the Pythagorean Theorem!

**REFERENCES**

E. Lamb, Does 1+2+3… Really Equal −1/12?, Roots of Unity (blog hosted by Scientific American), 2014.

E. Muñoz García and R. Pérez Marco, “The Product Over All Primes is 4*π*^{2} “, Commun. Math. Phys. 277, 69–81 (2008).

David Richeson, Tales of Impossibility: The 2000-Year Quest to Solve the Mathematical Problems of Antiquity, Princeton University Press (2019).

Videos about “−(1/2)!”:

Presh Talkwalkar, “What is the Factorial of 1/2?”

blackandredpen, “The Gamma function & the Pi function”

blackandredpen, “But can you do negative factorial?”

Euler’s Academy, “Gamma Function: (−1/2)!”

Videos about 1+2+3+… = −1/12:

Numberphile, ASTOUNDING: 1+2+3+4+5+… = −1/12

Mathologer, Numberphile v. Math: the truth about 1+2+3+…=−1/12

]]>— *The Exorcist *(William Peter Blatty)

Sometimes a key advance is embodied in an insight that in retrospect looks simple and even obvious, and when someone shares it with us our elation is mixed with a kind of bewildered embarrassment, as seen in T. H. Huxley’s reaction to learning about Darwin’s theory of evolution through natural selection: “How extremely stupid not to have thought of that.”

This phenomenon often arises as one learns math. Mathematician Piper H writes: “The weird thing about math is you can be struggling to climb this mountain, then conquer the mountain, and look out from the top only to find you’re only a few feet from where you started.” In the same vein, mathematician David Epstein has said that learning mathematics is like climbing a cliff that’s sheer vertical in front of you and horizontal behind you. And mathematician Jules Hedges writes: “Math is like flattening mountains. It’s really hard work, and after you’ve finished you show someone and they say “What mountain?””

These descriptions apply both to people who are learning math from books and to people working at the frontier of the known, discovering entirely new things. A lot of the work one does isn’t visible to others because sometimes you need to explore a terrain thoroughly before you can find the straight path through it. I’m reminded of the parable of the king who asked the greatest artist of the land to create for him a painting of a bird.^{1} The artist said “Come back to me in a year and I will give you your painting.” When the king returned, the artist said “I am still not ready; give me another year and I will give you your painting.” This happened several times, until finally the king said “Give me the painting now, or I will have your head cut off!” Thereupon the artist whipped out a brush and, in a few quick strokes, created the most beautiful painting of a bird the king had ever seen. Astonished, the king asked “If this was so easy for you, why did you make me wait so long?” By way of answer, the artist led the king to another room containing hundreds of sketches of birds. The artist’s inspired creation only seemed to come from nowhere; it grew out of a huge mass of preparation, hidden from sight.

In math, what often happens is that we try to solve a problem using one approach, and then another, and then another, failing each time, until we finally hit on an approach that works, possibly after months or years. In a different sort of mathematical culture, researchers might be encouraged to discuss those failures and the lessons they learned from them, but in our culture, it is customary to describe only the approach that worked. This custom has the unfortunate effect of making advances seem like strokes of genius rather than fruits of effort. Then other people’s responses tend to be less like “How extremely stupid not to have thought of that” and more like “How on earth could anyone ever think of that?”^{2}

**FEFFERMAN’S DEVIL**

Mathematician Charles Fefferman has an analogy for math that I like a lot (and in fact it’s the reason why I put a chessboard grid on the pseudosphere that serves as this blog’s logo); he says that doing math research is like playing chess with the Devil. Or rather, chess with *a* devil who, although much smarter than you, is bound by an ironclad rule: although *you* are allowed at any stage to take back as many moves as you like and rewind the game to an earlier stage, the devil cannot. In game after game, the devil trounces you, but if you learn from your mistakes, you can turn his intelligence against him, forcing him to become your chess tutor. Eventually you may run out of mistakes to make and find a winning line of play. Someone who reads a record of the final version of the game (the one in which you win) may marvel at some cunning trap you set and ask “How on earth did you know that this would lead to checkmate ten moves in the future?” The answer is, you already had a chance to explore that future.^{3}

In the same fashion, when you try to construct a proof, you often go down blind alleys, but if in the end you reach your goal, you can devise a straight path. In this way, we may see Fefferman’s Devil as unknowingly laboring in the service of Paul Erdős’ God (whom I wrote about last month): a proof that seems to exhibit godlike foresight could be the result of a devilish amount of preparatory fumbling.

Erdős lived for the moments when he caught a glimpse of God’s book and the gnomic proofs it contains, but I would prefer a book that shows the process by which mere mortals find their way to such proofs, surmounting obstacles, dismissing distractions, and maintaining hope along the way. What secrets the erasers and wastebaskets of mathematicians could tell us, if only they could talk!

**THE DEVIL’S BOOK**

Mathematician Doron Zeilberger has noticed the mathematical community’s preference for elegant proofs over proofs that solve problems by brute force, and he’s not happy about it. As a natural contrarian he’s suspicious of consensus and believes (or is willing to pretend to believe) that the future progress of mathematics depends on ugly computer proofs, not the kind of beautiful proofs people like. In his view, the finding of beautiful proofs will become an eccentric pastime of human mathematicians, while their electronic counterparts, untrammeled by our species’ odd notions of beauty, will make the real advances. Over time, our tools may becomes our masters, and we their pets.

If Zeilberger is right, mathematical historians of the future, be they humans or computers, will view the year 1976 as pivotal. That was the year in which mathematicians Kenneth Appel and Wolfgang Haken gave the mathematical community a solution to the century-old Four Color Problem that involved a huge hunk of brute-force computation. The problem itself can be explained to a child in five minutes; the proof found by Appel, Haken, and their computer would take years of toil for a human to verify with pencil and paper. A more recent and extreme example of an unreadable proof is described by Kevin Buzzard in his essay “A computer-generated proof that nobody understands”.

Zeilberger doesn’t just predict a brave new mathematical world; he’s doing his best to bring it into being. Along with the late Herb Wilf, Zeilberger created a mathematical technology for automating the proofs of a broad class of equations (see their book “A=B” written with Marko Petkovsek), and over the course of his career he has delighted in finding brute-force approaches to problems. For instance, consider Morley’s trisector theorem, a beautiful piece of Euclidean geometry that wasn’t discovered until the 19th century. (I’ll tell you more about it next Thirdsday.) There’s a beautiful proof found by John Conway, but Zeilberger wasn’t interested in finding a beautiful proof; he wanted a proof it doesn’t take a Conway to discover. So he found the ugliest proof: a brute-force algebraic verification that gives us complete certainty that Morley’s theorem is true but zero insight into *why* it is true.

Zeilberger wrote that the devil, too, has a book, and he imagined that his proof of Morley’s theorem belonged there. This book would contain all the boring, inelegant proofs missing from God’s book as conceived by Erdős. Actually, Zeilberger called both books notebooks, and this idea of the two books as evolving documents fits in nicely with thoughts about ugly and beautiful mathematics voiced by mathematician G. H. Hardy. Hardy wrote “There is no permanent place in the world for ugly mathematics” while acceding that temporary ugliness is an essential feature of doing mathematics; you can’t build cathedrals without putting up scaffolds.

Haken and Appel’s proof didn’t end the story of the Four Color Theorem; their proof led to a shorter proof, and the quest for even shorter proofs continues. Meanwhile, Erdős’ love of elegance didn’t stop him from being phenomenally productive, and most of the proofs he found fell short of his high standard for “Book proofs”. So maybe we don’t have to choose between God’s book and the devil’s? Maybe we can honor both?

**THE LAST LAUGH**

There’s a sense in which the Devil claims the final victory. Although there is not and may never be an uncontroverted notion of what constitutes mathematical elegance, there are objective ways to measure the simplicity of a proof as a kind of surrogate for elegance, and we might imagine that for every simple problem there is a simple solution that we could discover, at least in principle, by staying at the devil’s chess-table long enough. That is, for every easy-to-state theorem we might hope that there would be a proof that isn’t necessarily easy to find but which, once found, could be verified by humans (possibly with computer assistance). And here comes the real deviltry. Thanks to the tricks discovered in the 20th century by Kurt Gödel, Alan Turing, and Gregory Chaitin, reason can be turned against itself to show that there are bound to be theorems whose proofs are all obscenely long in comparison with the length of the theorem itself. This is related to Turing’s discovery that there is no sieve to unerringly sift the provable from the disprovable, and Gödel’s discovery that in any sufficiently advanced formal system there will be propositions that are *undecidable*: neither provable nor disprovable.^{4}

We don’t know the location of the border between the kind of math we care about and the kind Gödel et al. warned us about. Logician Harvey Friedman thinks the border may be closer than we think, and has spent the past few decades devising ever-more disquieting examples of problems that are haunted by the ghost of undecidability. We may someday find an easily-stated truth with no proof that can be uttered (let alone checked) in a lifetime, and we may never recognize the theorem as true. Our human mathematics may be a game limited to the shallows of reason; the farther out we wade, the greater the chances of being pulled out to sea by the undertow. Computers may enable us to go a little deeper, but there are limits to what beings in our universe, however constituted, can hope to do in the space and time allotted to us. Beyond what we can know, or ever will be able to know, there is a Void with a ragged beginning and no end. Is it laughing at us?

*Thanks to Kevin Buzzard, Sandi Gubin, Piper H, Cris Moore, Ben Orlin and James Tanton.*

Next month: The Square Root of Pi.

**ENDNOTES**

#1. I’m reconstructing this parable from memory, so I may have some details wrong. I couldn’t find this on the web, but surely one of you can find the source!

#2. I recently came across a great quote from mathematician Gian-Carlo Rota: “Philosophers and psychiatrists should explain why it is that we mathematicians are in the habit of systematically erasing our footsteps.” I discuss the phenomenon in my essay “The Genius Box”.

#3. Contrast Fefferman’s harmless devil with the villains in fantasy fiction, who can and will kill us and the people we care about. If fantasy novels were an unbiased sample of imaginary worlds, the vast majority would end mid-book, with the main character falling prey to some otherworldly peril her past experiences hadn’t prepared her for. The books we actually read are governed by a monumental amount of survivorship bias. In my zeroeth Mathematical Enchantments essay I wrote that math is my consolation for living in a world without magic, but really, I’m a big enough coward that if I were offered a passport to magical realms I’d probably turn it down. The worlds of fantasy that I like best have rules; if you run afoul of those rules, you die, and there is no reset button. Given a choice of adversaries, I’ll take Fefferman’s devil anytime.

#4. Actually, there is an escape clause from undecidability (but you won’t like it): given a formal system for proving theorems about counting-numbers-and-things-like-that, there may be an infallible way for us to recognize which theorems are provable in our system and which aren’t, but only if the task becomes trivial (all theorems are provable, none aren’t) and pointless (if all theorems are provable in our system, then provability can’t be telling us much about what’s true). That’s what happens if our formal system is inconsistent. Very few mathematicians are seriously worried that the systems that undergird mathematics (such as Peano Arithmetic or Zermelo-Fraenkel set theory) might harbor contradictions, and most of us have faith that even if these particular systems turn out to be flawed, the flaws can be fixed. But if no fix exists — if there is no way to put our mathematics on firm foundations — then I suspect the devil’s laughter fills the mathematical universe from one nonexistent end of time to the other.

]]>

— Paul Erdős

Creating gods in our own image is a human tendency mathematicians aren’t immune to. The famed 20th century mathematician Paul “Uncle Paul” Erdős, although a nonbeliever, liked to imagine a deity who possessed a special Book containing the best proof of every mathematical truth. If you found a proof whose elegance pleased Erdős, he’d exclaim “That’s one from The Book!”

I’m a fan of Erdős, but today I’ll argue that the belief that every theorem has a best proof is misguided.^{1}

Nowadays there’s a terrestrial shadow of Erdős’ celestial book, called “Proofs From THE BOOK”, and its authors Martin Aigner and Günter Ziegler explicitly disavow the idea that each theorem has a best proof. Their very first chapter contains six different proofs of the existence of infinitely many prime numbers. Ziegler, in an interview in Quanta Magazine, said: “There are theorems that have several genuinely different proofs, and each proof tells you something different.” Most mathematicians (and maybe even Erdős himself) would agree.

**A PROBLEM ABOUT TILINGS**

A great illustration of this “Let a hundred proofs bloom” point of view is provided by an article by Stan Wagon called “Fourteen Proofs of a Result About Tiling a Rectangle”. Here’s the result his title refers to (a puzzle posed and solved by Nicolaas de Bruijn): Whenever a rectangle can be cut up into smaller rectangles each of which has at least one integer side, then the big rectangle has at least one integer side too. (Here “at least one integer side” is tantamount to “at least two integer sides”, since the opposite sides of a rectangle always have the same length.)

A tiling means a dissection with no gaps or overlaps. Here’s a picture of the sort of tiling we’re talking about (taken from Wagon’s article).

We see a big rectangle tiled by small rectangles, where each of the small rectangles is marked with either an *H* (to signify that the horizontal side-length is an integer) or a *V* (to signify that the vertical side-length is an integer). I hope you’ll agree that it’s not obvious at all that the big rectangle must have an integer side! You might want to play around with the general problem for a bit to convince yourself both that it’s true *and* that it’s not obvious how to prove it.

Wagon presents fourteen elegant proofs of this result, contributed by a variety of colleagues; I’ll show you two of them (which can also be found in chapter 29 of Aigner and Ziegler’s book). In both proofs, we call the tiled rectangle *R*, and we let *a* and *b* be the width and height of *R*, respectively.

I’ll demonstrate, in two different ways, that at least one of the two numbers *a*, *b* must be a whole number. You may have an esthetic preference for one proof or the other, but neither of them is dispensable because each of them points in directions that the other doesn’t.

**THE CHECKERBOARD PROOF**

Divide the plane into ½-by-½ squares alternately colored black and white as in a checkerboard and superimpose this checkerboard with the tiled rectangle, so that the lower left corner of the tiled rectangle is a lower-left corner of a black square in the checkerboard. (Why is the checkerboard coloring relevant and helpful? Why do we want ½-by-½ squares instead of 1-by-1 squares? Wait and see!) I’ll draw the black squares as gray squares to make it easier for you to see the checkerboard and the tiling at the same time.

Here’s a proof that at least one of the two numbers *a*, *b* must be a whole number (with some of the details deferred to the Endnotes):

(1) Because each of the rectangles that tile *R* has an integer side, each tile *T* contains an equal amount of black and white; that is, the black part of *T* and the white part of *T* have the same area. (See Endnote #2 for justification.)

(2) Therefore *R* (being composed of tiles) contains an equal amount of black and white.

(3) Now ignore the tiles and focus on the *a*-by-*b* rectangle *R*. Suppose *a* and *b* aren’t integers. Let *m* and *n* be the integer parts of *a* and *b* respectively, and write *a*=*m*+*r* and *b*=*n*+*s*, with *r* and *s* between 0 and 1. Split *R* into an *m*-by-*n* rectangle, an *r*-by-*n* rectangle, an *m*-by-*s* rectangle, and an *r*-by-*s* rectangle, as in the picture below. Each of the first three rectangles has an integer side and therefore has equal amounts of black and white (as in (1)) but the fourth has more black than white. (See Endnote #3 for justification of that last assertion.)

(4) Therefore *R* contains more black than white.

(5) Since (2) and (4) contradict each other, our supposition that *a* and *b* aren’t integers is incompatible with the assumption that each of the tiles has an integer side.

(For a related approach, see Endnote #4.)

**THE TILES-AND-CORNERS PROOF**

Draw the standard Cartesian coordinate frame, with horizontal and vertical axes intersecting at (0,0), and superimpose this picture with the tiled rectangle, so that the lower left corner of the tiled rectangle is (0,0). (Why are coordinates relevant and helpful? Wait and see!)

We’ll put a black dot at the center of each tile and a white dot at each tile corner whose *x*– and *y*-coordinates are both integers (let’s call a point like this an “integer corner”), and we’ll draw a dashed line from a black dot to a white dot if the black dot is at the center of a tile and the white dot is at one of the four corners of that tile.

The heart of the argument is to count the dashed lines in two different ways, with each way of counting providing different information about the total. One way to count the dashed lines is to consider how many dashed lines emanate from each black dot, and add up those numbers (in the picture above, that would give a 4 and four 2’s); the other way is to consider how many dashed lines come into each *white* dot, and add up *those* numbers (in the picture above, that would give five 2’s and two 1’s).

(1) Each tile has 0, 2, or 4 integer corners. (See Endnote #5.) So each black dot has an even number of dashed lines emanating from it.

(2) Therefore the total number of dashed lines, being a sum of even numbers, is even.

(3) Suppose *a* and *b* are non-integer, so that (*a*,0), (0,*b*), and (*a*,*b*) are not integer corners and there are no white dots there. Then the white dot at (0,0) lies on one dashed line (joining it to the black dot in the middle of the lower-leftmost tile) and every other white dot is a corner of either 2 tiles (as shown above) or 4 tiles (as shown below), so each of those other white dots lies on an even number of dashed lines.

(4) The total number of dashed lines is equal to the number of dashed lines passing into (0,0) (which is 1) plus the number of dashed lines passing into the other integer corners (which, being a sum of even numbers, is even). If you add a bunch of numbers, one of which is odd and the rest of which are even, the total is odd, so the number of dashed lines is odd.

(5) Since (2) and (4) contradict each other, our supposition that *a* and *b* are non-integer is incompatible with the assumption that each of the tiles has an integer side.

**TRICKS AND METHODS**

Back in the 1920s, mathematician George Pólya wrote: “An idea that can be used only once is a trick. If one can use it more than once it becomes a method.”

What would Pólya have said about the two proofs of de Bruijn’s theorem that appear above? Is the idea of imposing a checkerboard coloring a trick or a method? What about the two-ways-of-counting idea?

Here are couple of problems you might try to solve using those ideas.

**The Mutilated Checkerboard Problem**: Consider an 8-by-8 square from which two diagonally opposite 1-by-1 squares have been removed, with total area 64 − 2 = 62. Is there a way to tile it with 31 1-by-2 rectangles (which can be either horizontal or vertical)? See Endnote #6.

**The Half-Friendly Party Problem**: Someone tells you “I was at a party with 6 other people, and interestingly, each of us was friends with exactly half of the other 6 people.” (Assume that if person A is friends with person B, then person B is friends with person A; also assume that no person is their own friend.) Clearly the person who told you this is a nerd and a bore, but are they also a liar? See Endnote #7.

I’d modify Pólya’s dictum and say that when a trick works in multiple settings, it’s still a trick, but it goes into a big bag containing all the tricks you’ve ever seen, and a “method” you can use when attacking a new problem is rummaging through the bag and asking yourself “Which of these old tricks might solve this new problem?” Part of one’s mathematical education is filling the bag.

Pólya made a brave start at sharing his own personal bag of tricks and his general approach to problem-solving in the book “How To Solve It”, and more recently there was an effort to crowd-source the collation of mathematical problem-solving tricks at tricki.org. Say someone who’s never seen de Bruijn’s theorem before is trying to prove that there’s no way to divide a rectangle that has no integer side-lengths into smaller rectangles that each have an integer side-length, but they’re stuck. If they go to the Tricki main page and click on “What kind of problem am I trying to solve?” and then click on “Techniques for proving impossibility and nonexistence” and then click on “Invariants” and then click on “Elementary examples of invariants”, they can see both of the proofs I’ve just shown you. Unfortunately the search feature of the Tricki site is very primitive, which limits the usefulness of the site. But it’s a start.

**WHAT IS BEST?**

There’s an episode of “The Office” in which the character Jim Halpert pulls a prank on his deskmate Dwight Schrute by doing a brilliant imitation of Dwight’s picayune pomposity, asking and then answering the nonsensical question “What kind of bear is best?” The question is hilariously idiotic on many levels (Dwight himself declares the question to be “ridiculous” even though it’s an exaggerated version of the sorts of things he himself says). One aspect of the idiocy is that “best” has no clear meaning in this context. If you need a bear that can claw its way through permafrost, you probably want a polar bear; if you need one that can climb a tree, you probably don’t. “What bear is best?” deserves the response “Best *at what*?” Likewise, the question “What proof is best?” deserves the response “Best *for what*?” and “Best *for whom*?”

Let’s tackle “Best for whom?” first. As Ziegler says, “For some theorems, there are different perfect proofs for different types of readers. I mean, what is a proof? A proof, in the end, is something that convinces the reader of things being true. And whether the proof is understandable and beautiful depends not only on the proof but also on the reader: What do you know? What do you like? What do you find obvious?”

The meaning of “Best for what?” requires some elaboration. No problem or theorem is an island, and the theorem we’ve been discussing should be understood not as an isolated puzzle but as part of a family of related problems. For instance, we might go into higher dimensions and consider a tiling of a three-dimensional box *B* by smaller boxes, each of which has at least one of its three side-lengths being integers. Must at least one of the three side-lengths of *B* be an integer? The answer is yes, and both the checkerboard proof and the tiles-and-corners proof can be modified to prove this variant of the tiling-a-rectangle problem.

Here’s another variant: Look at a tiling of a three-dimensional box *B* by smaller boxes, each of which has at least **two** of its three (non-parallel) side-lengths being integers. Must at least two of the three side-lengths of *B* be an integer? The answer is again yes, and the checkerboard proof can be adapted to solve this problem — but the tiles-and-corners proof can’t.

Should we conclude from this that the checkerboard proof is superior to the tiles-and-corners proof? No! Here’s a variant that tiles-and-corners approach handles easily but the checkerboard approach doesn’t: Show that, whenever a rectangle is tiled by rectangles each of which has at least one **rational** side, then the tiled rectangle has at least one rational side.

Neither of the two proofs of the original theorem supplants the other; each provides insights that the other lacks, and leads in directions that the other can’t. Wagon considers fourteen proofs in total, and shows that there is no best proof in the bunch; each has limitations as well as strengths.

Once we start to view math problems not as isolated puzzles but as part of a huge interconnected tapestry − a tapestry that the mathematical community is constantly exploring and extending − then we see that the idea of the One Best Proof fails to do justice to the richness of the tapestry. Which proof is best, you ask? Well, that depends: where do you want to go next in your exploration of the tapestry? If you want to head in *this* direction, solution A might be good; if you want to go *thataway*, solution B might be better.

One best proof? Sorry, Uncle Paul; I say “False”. You don’t have to disbelieve in The Book, but you should disbelieve *that*.

*Thanks to Sandi Gubin, David Jacobi, Joel Spencer, Stan Wagon, and Günter Ziegler.*

Next month (Feb. 17): Chess with the Devil.

**ENDNOTES**

#1: Maybe I’m being unfair to Erdős here, but as far as I know he never modified his original view that each theorem has just one Book proof. Then again, Ziegler (in private communication) remarks: “As I remember Uncle Paul, I think it was not important for him to be right, also in this point, but it was important to him to have a good story to tell, and he did.”

#2: We’re going to prove this claim by literally tearing it to pieces, or rather by tearing the rectangle to pieces. For definiteness we’ll focus on the case where the height of the rectangle is a whole number (since the argument for the case where the width is a whole number is essentially the same).

If there’s an *x*-by-*n* rectangle in the plane, where *x* is a real number and *n* is a whole number, you can tear it into *n* *x*-by-1 strips, so IF we can show that each of those *x*-by-1 strips contains equal amounts of black and white, THEN we’ll know that the whole *x*-by-*n* strip does too. (Here’s a picture of what that slicing-up looks like when *n *is 3.)

Are we done cutting the problem down to size? By no means! If an *x*-by-1 strip happens to intersect some of the vertical lines of the checkerboard, you can cut that strip into pieces, using the vertical lines of the checkerboard as cut-lines. As before, IF we can show that each of those new sliced-and-diced pieces has as much black as white, THEN we’ll be able to conclude that the whole strip does.

So now we’ve reduced the claim to tiny rectangles that fit between two consecutive vertical lines in the ½-by-½ checkerboard, and that have height 1 (here I’ve blown up the picture for intelligibility):

Since the tiny rectangle has height 1 and the black part in its middle has height ½, it’s clear that the tiny rectangle is half black and hence half white as well. (If you look at this last part of the argument, you’ll see where we need the fact that all the checkerboard squares are ½-by-½.) Rewinding the argument, we’ve shown that 1-by-*x* slices are half black and half white, and that shows that the *n*-by-*x* rectangle is half black and half white.

#3: We want to show that if *r* and *s* are between 0 and 1, then an *r*-by-*s* rectangle with a black checkerboard square nestled in its lower left corner has more black than white. The most interesting case is when *r* and *s* are both bigger than ½, as in the picture. Write *r* = ½ + *t* and *s* = ½ + *u*, with *t* and *u* between 0 and ½.

The *r*-by-*s* rectangle consists of a black ½-by-½ square, a white *t*-by-½ rectangle, a white ½-by-*u* rectangle, and a black *t*-by-*u* rectangle. So the black area minus the white area equals (½)(½) − (*t*)(½) − (½)(*u*) + (*t*)(*u*) = (½ − *t*)(½ − *u*), which (being a product of two positive numbers) must be positive.

#4: A variant of the checkerboard proof uses the fact that the black area of *R* minus the white area of *R* can be written as the product of two numbers, *L* and *M*, where *L* is the black length of the bottom edge of *R* minus the white length of the bottom edge of *R* and *M* is the black length of the left edge of *R* minus the white length of the left edge of *R*. (By the “black length of the bottom edge of *R*” I mean the sum of the side-lengths of the black squares that adjoin the bottom edge of *R*. Similarly for the other black and white lengths.) You might check that the punchline of Endnote #3 uses this trick.

#5: Write the corners as (*x*,*y*), (*x*‘,*y*), (*x*,*y*‘), and (*x*‘,*y*‘). Suppose the tile has integer width. Then *x*‘ and *x* differ by an integer, so either (*x*,*y*) and (*x*‘,*y*) are both integer corner points or neither one is, and ditto for (*x*,*y*‘) and (*x*‘,*y*‘). So the number of integer corners of the tile is 0, 2, or 4. The same conclusion holds in the case where the tile has integer height.

#6: The word “checkerboard” in the name of this classic problem gives away the trick (excuse me, method). The two removed squares have the same color (black, say), so the resulting mutilated board has more white than black squares. On the other hand, if we could tile the mutilated board with 1-by-2 rectangles, then the board would have to contain equal amounts of white and black, since each individual tile does. This contradiction shows that no such tiling is possible.

#7: How many friend-pairs were at the party? For each person *x*, we can count the number of people at the party who are friends with *x*, and then add up all those numbers; that should give us the number of friend-pairs times two, since each friend-pair gets counted twice by this method. (If *x* and *y* are friends, then *y* counts as a friend of *x* and *x* counts as a friend of *y*.) So we must get an even number.

But: each of the 7 people is friends with 3 people, and 3+3+3+3+3+3+3 is odd. Contradiction!

#8: I hope to present a version of this essay at the 14th annual Gathering for Gardner in March 2020.

**REFERENCES**

Martin Aigner and Günter M. Ziegler, “Proofs from THE BOOK”, Springer (Sixth Edition 2018).

Erica Klarreich, “In Search of God’s Perfect Proofs”, Quanta Magazine.

George Polya, “How To Solve It” (1945). For a quick summary of some of the main ideas, look at the handout George Melvin created for a course he taught at Berkeley.

Stan Wagon, “Fourteen Proofs of a Result About Tiling a Rectangle”.

]]>We got the points, but I liked the other team’s answer better. The idea of an empty salad might seem like a purely mathematical fancy, but half a dozen years later I saw a restaurant menu that offered the null salad, or rather “Nowt, served with a subtle hint of sod all” (for the unbeatable price of 0 pounds and 0 pence).^{2}

Today I’ll tell you how to make null salad — not just the tossed kind, but also the kind that’s artfully composed of stacked leafy greens, except that there aren’t any greens in the stack. The trick to preparing it is knowing when to stop, namely, before you start, and the hardest part of all isn’t doing it, but correctly counting how many ways there *are* to do it. You might think the correct count is 0, but it’s not. Coming to understand why the answer isn’t 0 is tricky; it hinges on understanding the difference between a task that’s impossible to do and a task that’s impossible to start because it’s already finished.^{3} Or, putting it differently, it’s about the difference between doing the impossible and doing nothing. There are exactly 0 ways to do the impossible, but in many mathematical settings, there’s exactly 1 way to do nothing.

**COUNTING TOSSED SALADS**

Of course the original problem is a bit silly, since it assumes that a salad that’s 90% arugula and 10% basil is the same as a salad that’s 10% arugula and 90% basil (but if we didn’t make that assumption, there’d be too many different salads to count and no clear rules for counting them). The problem also ignores the fact that some combinations of greens might not be palatable (but if we didn’t make that assumption, once again there’d be no clear rules for counting the possibilities).

The inclusion of the word “tossed” might seem incidental, but it actually serves an important mathematical role; it tells us that the ingredients are mixed together higgledy-piggledy, so that their arrangement within the bowl isn’t what we’re interested in. All that we’re supposed to care about is which ingredients get used (no matter how little) and which ingredients get left out.

Let’s stick to just arugula and basil for a bit. We have a choice about whether to include arugula or not, and we have a choice about whether to include basil or not. If we disallow the empty salad, then these choices are linked, because deciding to omit arugula would force us to include basil (and deciding to omit basil would force us to include arugula). But if we allow the empty salad, then the choices can be made independently of each other. For each of the two possible ways of deciding about arugula (arugula or no arugula?), we get two possible ways of deciding about basil (basil or no basil?). So the situation becomes symmetrical with regard to inclusion versus exclusion: there are exactly as many salads that include arugula as there are salads that exclude arugula, and ditto for basil. And that’s a good thing, because by and large symmetrical definitions are easier to work with.

Once we’ve decided that it’s expedient to include the empty salad, we find that there are 4 different salads that can be made with arugula and basil as allowed ingredients: arugula-and-basil, arugula-only, basil-only, and the empty salad.

What happens when we include celery as an allowed ingredient? Each of the 4 salads that can be made with arugula and basil as allowed ingredients gives rise to 2 different salads according to whether we use celery or not; so there are twice 4, or 8, salads that can be made with arugula, basil, and celery as allowed ingredients.

I think you see where this is going: when we add a fourth allowed ingredient (dandelion), the number of possibilities becomes twice 8, or 16, and when we add a fifth allowed ingredient (endive), the number of possibilities becomes twice 16, or 32. Thinking about this pattern, we see that the number of different salads we can make with *n* allowed ingredients is 2 times 2 times 2 times … times 2, where the number of 2’s is *n*. We write this number as 2^{n} for short.

In contrast, if we’d chosen to disallow the empty salad, we would have gotten the messier answer 2^{n}−1.

**BACK TO ZERO**

Let’s pause and look at the expression 2^{n} for a bit. It’s defined as the product 2 × 2 × 2 × … × 2, where the number of 2’s is *n* and the number of multiplication signs is *n−*1. This makes sense when *n* is bigger than 1, and it even makes sense, sort of, when *n* is 1: the number of 2’s is 1 and the number of multiplication signs is 0, so our “product” looks like just “2”, whose value is clearly 2 (even though “2” by itself isn’t a product in the ordinary sense).

But what if *n* is 0? Once upon a time, when nobody had yet defined 2^{0}, mathematicians had to make up their collective mind about what they wanted it to mean. They could have left it undefined, because it makes no sense to speak about a product of 2’s in which the number of 2’s is 0 and the number of multiplication signs is *−*1. But it became clear pretty quickly that there are contexts in which it’s useful to define 2^{0} to be 1 (and few or no contexts in which it’s useful to define 2^{0} to be some other value). One good feature of filling in the blank in “__, 2, 4, 8, 16, 32, …” with a 1 is that the resulting sequence 1, 2, 4, 8, 16, 32, … follows a uniform rule: each term is twice the term before (or, going backwards, each term is half of the term that follows).

For the practical-minded, here’s a financial application. Consider a bank that gives interest at the (unrealistic but simple-to-discuss) rate of 100% compounded annually. After *n* years, where *n* is any positive integer you like, an initial deposit of $100 will become $100 times 2^{n}. What’s the situation after 0 years? If “after 0 years” means “right after you’ve deposited your money”, then you have $100 in your account and not a penny more or a penny less, so it makes sense that 2^{0} should be taken to equal 1.

The permissive notion of salads that goes hand-in-hand with the conventional definition 2^{0} = 1 gives us a new way to see the null salad shining on its own, and not suffering from invidious comparisons with its more filling fellow-salads: if there are 0 ingredients to choose from, then we can make exactly 1 tossed salad from them, namely the null salad.

**COUNTING LAYERED SALADS**

What if we have salads in which structure matters? Let’s throw culinary realism to the winds here and consider salads that contain a single arugula leaf, a single basil leaf, a single celery leaf, a single dandelion leaf, and a single endive leaf. (Never mind that a true composed salad should have a base, a body, and a garnish.) How many such salads are there? More generally, if there are *n* different kinds of leaf, how many ways are there to build a salad by stacking together one leaf of each kind? (Yes, I know, if you stack actual leaves they won’t stay stacked. But this is math, not cuisine.)

When *n* is 1, there’s only 1 way to go.

When *n* is 2, there are 2 choices of which leaf to put on the bottom, and then we’re forced to put the other leaf on the top, so there are 2 possible salads.

When *n* is 3, there are 3 choices of which leaf to put on the bottom, and then 2 remaining choices for what to put on top of that, and then only 1 remaining option for what to put on the top, so the total number of possibilities is 3 × 2 × 1 = 6.

Again, I think you see where this going: when *n* is 4, the total number of possibilities is 4 × 3 × 2 × 1 = 24, and when *n* is 5, the total number of possibilities is 5 × 4 × 3 × 2 × 1 = 120.

We have the symbol “*n*!” (pronounced “*n* factorial”) for the product of the counting numbers from 1 through *n*. So now we are ready to ask the question, how should 0! be defined?

Looking at the sequence __, 1, 2, 6, 24, 120, … in reverse, we see that starting from 120, we first divide by 5 (obtaining 24), then divide by 4 (obtaining 6), then divide by 3 (obtaining 2), then divide by 2 (obtaining 1). So if we want the pattern of divisions to continue, we should next divide by 1 (obtaining 1). By this reasoning, the right value for 0! is 1.

If you prefer a more practical reason for the convention 0! = 1, consider tossing a fair coin *n* times in some gambling game. What’s the probability of getting heads *a* times and tails *b* times? The binomial formula

gives the right answer when *a* and *b* are both positive integers (let me know in the Comments if you know a well-written online reference for this formula that explains why it’s true!). It’s not important to understand where the formula comes from; what I want you to notice is that if you want it to give the right answer when *a*=0 or *b*=0, you’d better take 0! = 1. (If you take 0! = 0 then the expression has a 0 in the denominator and hence makes no sense; if you take 0! to be any nonzero number other than 1, the expression makes sense but gives the wrong answer; if you take 0! to be 1, you get the right answer.)

For those who prefer computer science to probability, the number of ways to form a bit-string consisting of *a* 0’s and *b* 1’s is

(Example: When *a* = *b* = 2, the expression evaluates to 24 / 4 = 6, and the 6 bit-strings consisting of two 0’s and two 1’s are the binary words **0011**, **0101**, **0110**, **1001**, **1010**, and **1100**. If you know a good online proof, I’d love to include a link to it.) In the case where *a* is positive and *b* is zero, you can check that the number of such bit-strings is 1, and that the above expression takes the value 1 only if 0! is defined to be 1.

It’s fun to contemplate the case where *a* and *b* are both 0. For people who study probability, this corresponds to a gambling game in which, just before you make your first toss, you remember what your parents told you about gambling, suddenly “need to use the bathroom”, and sneak out the back. If you do this, you’re 100% certain to toss 0 heads and 0 tails, so the probability of that event is 1. For computer scientists, *a*=*b*=0 corresponds to the bit-string consisting of no bits at all, often written as λ instead of as .^{4} The bit-string λ has length 0, but it’s a mathematical entity, and there’s 1 of it.

So now we can ask: are there 0! layered salads with 0 ingredients? One way to define a layered salad (in the somewhat strange way I’m using the term) is a collection of edible leaves such that (a) each type of leaf that occurs must occur only once, and (b) any two leaves that occur must occur in some definite order. In a persnickety mathematical sense, the empty salad satisfies both conditions vacuously, because (a) you can’t show me a leaf that occurs more than once, and (b) you can’t show me two leaves that *don’t* occur in some definite order. Therefore, there’s 1 (i.e., 0!) layered salad with 0 ingredients.

So if your fridge is empty, and you’re tired of tossed salad with 0 ingredients, you can vary your diet by having a *layered* salad with 0 ingredients instead.

**THE TILING THAT HAS NO TILES AND THE PATH THAT HAS NO STEPS**

Here’s another application of the one-way-to-do-nothing principle. How many ways are there to tile a 2-by-0 rectangle with identical 2-by-1 tiles? We’ll overcome our initial *stupor vacui*^{5} by broadening the question, replacing 0 by an unknown positive integer *n*; then, once we’ve seen what the governing pattern is, we’ll sneak up on 0 by counting backward from the positive integers, and then we’ll notice that the paradoxical answer we obtain actually makes a persnickety kind of sense.

The number of ways to tile a 2-by-1 rectangle with 2-by-1 tiles is 1:

The number of ways to tile a 2-by-2 rectangle with 2-by-1 tiles is 2:

The number of ways to tile a 2-by-3 rectangle with 2-by-1 tiles is 3:

The number of ways to tile a 2-by-4 rectangle with 2-by-1 tiles is 5:

Mathematician/dad/videographer Mike Lawler helped his child study this problem. For general values of *n* the answer is given by the Fibonacci sequence 1,2,3,5,8,13,…

Once you’ve understood why each term is equal to the sum of the two preceding terms (see Mike’s follow-up post on this topic, featuring both kids this time), you can turn the pattern around and say that each term is equal to the *difference* between the two *succeeding* terms. More precisely, the *n*th term in the sequence is equal to the *n*+2nd term minus the *n*+1st term. If we apply that formula in the case *n*=0, we see that the number of tilings of the 2-by-0 rectangle “should” be the number of tilings of the 2-by-2 rectangle minus the number of tilings of the 2-by-1 rectangle, which gives us 2 minus 1, or 1.

That is, the pattern suggests that there’s exactly 1 way to tile a 2-by-0 rectangle with 2-by-1 tiles, and if you think about it, that’s exactly right. The way to tile a 2-by-0 rectangle with 2-by-1 tiles is to say “There’s room for exactly 0 tiles, so I’m done” and then fold your arms. Or as Mike’s son puts it: the way to do it is *not to do anything*. And there’s exactly one way to do that.

As important in modern combinatorics as the Fibonacci numbers are the Catalan numbers 1,2,5,14,42,… (discussed in a video listed in the References). One thing Catalan numbers count is paths in a grid that stay in a triangle. For instance, 42 (the 5th Catalan number) counts the paths in the picture below that join the point *A* = (0,0) to the point *B* = (5,5) consisting of 10 steps that stay within the triangle bounded by dashed lines.

The *n*th Catalan number is equal to the number of paths from (0,0) to (*n*,*n*) consisting of 2*n* steps that stay within the triangle bounded by (0,0), (*n*,0), and (*n*,*n*). A formula for the *n*th Catalan number is

Plugging in *n*=0, we get the answer 1. This counts the number of paths from (0,0) to (0,0) consisting of 0 steps that stay within the triangle bounded by (0,0), (0,0), and (0,0). The three vertices of this triangle are the exact same point, so this region gives you no room to move; but since we’re looking at paths of length 0 within that region, there’s no need to move. Your journey is over as soon as it’s started.

**MATH MINDFULLNESS**

I’ve come about as close to talking about the null set as one can without actually talking about it. I’ll write about the null set some other time; in the meantime, you can read Evelyn Lamb’s fun essay on the topic, which will give you some insight into the body of mathematical work Ben Orlin is riffing on in his cartoon back at the beginning of this piece where he describes the null salad as “the basis of all salads” in the set-theoretic sense.

The null salad (tossed or layered), the empty binary word λ, the tiling with no tiles and the path with no steps inhabit the no-man’s-land between Nothingness and Somethingness, and in discussing them I’m trespassing onto territory claimed by mystics. So I’ll close by proposing, with tongue firmly in cheek, a mathematico-mystical morning meditation practice that will help novices come to deeply understand the mathematics of Doing By Not-Doing.

Your ritual is to prepare and consume null salad. Choose your ingredients, which you need not have on hand; lack of ingredients does not matter, since the recipe requires none of them. Perhaps your choice today is a simple null fruit salad, consisting of 0 apples. After preparing your salad, perform arithmetic operations on your salad that in no way mar its perfect null-ness. For instance, add 0 apples to your salad. Then subtract 0 apples. Then double your portion. Then halve it. It may seem impossible to cut your apple salad in half because there are no apples to cut nor any knife in hand to cut them with, but that does not make the task impossible; indeed, as soon as the task comes into your mind, the task is already completed.

As you carry out these operations, including the final step of consuming the salad in zero mouthfuls, repeat in your mind the mantra λ, The Empty Word, whose meaning is: “There is nothing to be done, and I have already done it.”

Feel enlightened?

Good. Now go eat something.

Next month: What Proof Is Best?

*Thanks to Sandi Gubin, David Jacobi, Joe Malkevitch, Henri Picciotto, and Evan Romer.*

**ENDNOTES**

#1. I would’ve written “salad days”, but that seemed too cheap a pun. Oops, I just wrote it anyway.

#2. The restaurant was Sweeney Todd’s Pizza in Cambridge, England, no longer in business. (Maybe some health inspectors heard disturbing rumors about what sort of meat went on their pizzas?) Incidentally, Silouan Winter has pointed out to me on Twitter that in southern Germany you can find restaurants that offer an empty plate for kids who partake of their parents’ food, called a “Räuberteller” (“robber’s plate”).

#3. I’m reminded of Salvador Dali’s speech in which he stood up, said “I shall be so brief I have already finished,” and then sat down. Or did he? Anyone who can find a source for this quote should post it in the Comments!

#4. See the StackExchange discussion of the origin of the symbol λ for the empty string.

#5. I’m coining the term “*stupor vacui*” in analogy with the existing phrase “*horror vacui*“. It’s intended to refer to the way the mind boggles when it confronts the vacuous, and in desperation clings to answers like “Zero!” or “Nonsense!” or “Impossible!” and rejects an incongruously non-vacuous answer like “Exactly one”.

**REFERENCES**

Alissa Crans, “A Surreptitious Sequence: The Catalan Numbers” (video), produced by the Mathematical Association of America.

Martin Gardner, Nothing; chapter 1 in “Mathematical Magic Show”.

Martin Gardner, More Ado About Nothing; chapter 2 in “Mathematical Magic Show”.

Martin Gardner, Fibonacci and Lucas Numbers; chapter 13 in “Mathematical Circus”.

Martin Gardner, Catalan Numbers; chapter 20 in “Time Travel and Other Mathematical Bewilderments”.

Evelyn Lamb, A Few of My Favorite Spaces: The Empty Set.

]]>It’s not hard to see that this idea has serious limitations. For instance, even though many legal issues surrounding abortion hinge on different definitions of the word “life”, when it comes to the moral side of the debate, definitions don’t change anyone’s mind. Usually we each choose the definition that matches an outcome we’ve decided on, not the other way around. But in mathematics (thank goodness for the consolations of math!), things are different.

Definitions have been on my mind lately for two reasons: I’m teaching lots of definitions to the students in my discrete mathematics course, and I’ve been reading about the work of Kevin Buzzard and his collaborators, who have been teaching lots of definitions to a computer program for doing mathematics.

Definitions are nothing new in mathematics — Euclid’s *Elements* starts with a few, such as “a point is that which hath no part”. But surprisingly, Euclid’s proofs don’t make much use of the initial definitions; it’s the axioms (and the later definitions) that do the heavy lifting.^{1} One modern point of view about this paradoxical situation is that even though Euclid’s first definitions gives readers a way to *think* about points, lines, planes, etc., it’s the axioms that implicitly tell us what these mathematical objects *are*. That is, at an abstract level, Geometry Is As Geometry Does, and a Euclidean “point”, rather than being that-which-hath-no-part, is any object of thought whose properties in relation to other points (and lines and planes) obey Euclid’s axioms. Under this perspective, many mathematical definitions can look a bit circular, though “relational” would be a more apt term.

In a mathematical treatise like Euclid’s, you don’t get all the definitions at the beginning; they’re peppered throughout, with later definitions depending on earlier ones. You could try to read all the definitions at the start, but aside from the fact that you’d overload your brain, a lot of the definitions, read in isolation, would seem arbitrary. You’d be missing the way that those definitions give rise to interesting theorems that retroactively justify those definitions. Amid all the definitions one *could* make, some are more natural, interesting, or useful than others, and it’s not clear from the start what those definitions will be. For instance, until you have some experience multiplying and factoring numbers, it may not clear why the concept of prime numbers matters; and I think it’s only when you’ve seen the unique factorization theorem that the true significance of the prime numbers comes into view.

I like what the late mathematician and educator Charles Wells wrote about definitions:

Some students don’t realize that a definition gives a magic formula — all you have to do is say it out loud. More generally, the definition of a kind of math object, and also each theorem about it, gives you one or more methods to deal with the type of object.

For example,

nis a prime by definition ifn>1 and the only positive integers that dividenare 1 andn. Now if you know thatpis a prime bigger than 10 then you can say thatpis not divisible by 3 because the definition of prime says so. (In Hogwarts you have to say it in Latin, but that is no longer true in math!) Likewise, ifn>10 and 3 dividesnthen you can say thatnis not a prime by definition of prime.You now have a magic spell — just say it and it makes something true!

What the operability of definitions and theorems means is: A definition or theorem is not just a static statement, it is a weapon for deducing truth.

The role of definitions changes as one advances in one’s mathematical education. Some of the change is quantitative. New definitions build on earlier definitions, which build on even earlier definitions, and so on; the more math you learn, the taller your personal definition-tower becomes. The same is true on a communal level: understanding a recent definition like Peter Scholze’s notion of “perfectoid spaces” requires understanding dozens if not hundreds of concepts that the definition builds upon, to which thousands of mathematicians have contributed.

But a less obvious, qualitative difference is that *sometimes definitions don’t even make sense without certain theorems*. That is, a definition may make a tacit claim, and proving the claim may take hard work. Many definitions in advanced mathematics are like this.

Sometimes the tacit claim is *existence* or *uniqueness*. As a non-mathematical example, notice that the phrase “the Prime Minister of the United States” makes no sense; neither does the phrase “the baseball team of the United States”, but for a different reason. The first phrase doesn’t denote an actual person, because the U.S. has no Prime Minister; the second phrase doesn’t denote an actual team, because the U.S. has many baseball teams. In cases like these, the use of the word “the” followed by a singular noun or noun-phrase requires that its referent *exist* and that its referent be *unique*.^{2}

Here’s a mathematical example: when we say “The infinite decimal .333… is defined as the unique number that lies in the intervals [.3,.4], [.33,.34], [.333,.334], etc.”, we’re asserting simultaneously that there is *at least* one such number and that there is *at most* one such number. Many mathematical definitions share this property of making tacit claims; proving these claims requires a side-bar. Those proofs may in turn depend on other theorems, and other definitions, which in turn depend on other theorems. So if you could look inside someone’s brain and somehow see a definition as literally sitting atop earlier definitions, the tower wouldn’t consist merely of definitions; there’d be theorems mixed in there too.

Another wrinkle in advanced mathematics is that many concepts are mergers of several different-looking concepts that turn out to be equivalent for non-obvious reasons. One might see a passage of an advanced math textbook that looks something like this:

**Theorem**: Let *F* be a foo [where a “foo” is some previously-defined kind of mathematical object]. Then the following conditions are equivalent:

(1) …

(2) …

(3) …

**Proof**:

(1) implies (2): …

(2) implies (3): …

(3) implies (1): …

**Definition**: Any foo satisfying conditions (1), (2), and (3) is called a *fnord*.

Which of the three conditions is the “true” definition of fnordness? All of them! Which of the three conditions one should focus on will depend on context.^{3} This situation crops up so often in mathematics that the initialism “TFAE” (for “The Following Are Equivalent”) has become a standard part of a mathematician’s education in the English-speaking world. (If any of you know corresponding initialisms in French or other languages, please post them in the Comments!)

Someone at the frontier of mathematical research may wind up in a situation where there are multiple conditions that are not *quite* equivalent, and must choose which one to canonize as the “right” definition. This requires a certain amount of prescience about the direction of future developments, and since mathematical history (like any other kind) has a way of surprising the people who live through it, sometimes mathematicians get it wrong. For instance, once upon a time it seemed natural to define the word *prime* so as to include the number 1, but nowadays mathematicians agree that, given the directions that number theory has gone in, it’s best to call 1 neither prime nor composite, but to call it a *unit*.^{4}

There are interesting issues about how to tweak an existing definition to handle borderline cases (see the discussion of 0^0 in my May 2019 essay), but a higher order of creativity comes from devising (good) new definitions. Ideally a new definition should enable the creator of the definition to solve existing problems while introducing new directions for future research.

Here’s Kevin Buzzard writing about the research interests of the people he works alongside at Imperial College in London, and contrasting the definition of perfectoid spaces (and other hot topics) with the less fashionable notion of Bruck loops, about which I’ll say nothing except to mention that Buzzard defines them in the space of one long paragraph, thereby demonstrating that one *can* define them succinctly, under a suitable definition of succinctness:

I work in a mathematics department full of people thinking about mirror symmetry, perfectoid spaces, canonical rings of algebraic varieties in characteristic

p, etale cohomology of Shimura varieties defined over number fields, and the Langlands philosophy, amongst other things. Nobody in my department cares about Bruck loops. People care about objects which it takes an entire course to define, not a paragraph.So what are the mathematicians I know interested in? Well, let’s take the research staff in my department at Imperial College. They are working on results about about objects which in some cases take hundreds of axioms to define, or are even more complicated: sometimes even the definitions of the objects we study can only be formalised once one has proved hard theorems. For example the definition of the canonical model of a Shimura variety over a number field can only be made once one has proved most of the theorems in Deligne’s paper on canonical models, which in turn rely on the theory of CM abelian varieties, which in turn rely on the theorems of global class field theory. That’s the kind of definitions which mathematicians in my department get excited about — not Bruck Loops.

When a new definition like “perfectoid spaces” garners professional acclaim for the person who came up with it, and word gets out, it’s natural for the scientifically-interested public to want someone to tell them what the fuss is about. And this is where things get tricky. For the expert, the new definition is at the top of a personal Jenga-tower in their brain. There simply isn’t time to build a copy of that tower, or even a streamlined version of it, in the reader’s brain. There needs to be something simpler, and a certain amount of distortion is inevitable.^{5}

Some writers resort to metaphor. Others connect the new concept with concepts slightly lower in the Jenga-tower, treating them all as black boxes and explaining how they relate to one another, saying things like “[Concept X] unifies [Concept Y] with [Concept Z]” without ever explaining the details of Concepts X, Y, and Z. Still others despair of explaining the math and resort to biography (e.g., “The crucial insight finally came to her while she was scuba-diving during her honeymoon in Australia”).^{6}

To see what happened when Michael Harris accepted the challenge of trying to explain Scholze’s perfectoid spaces to a general scientific readership, read his essay “Is the tone appropriate? Is the mathematics at the right level?“, and for comparison read Gilead Amit’s essay The Shape of Numbers that *New Scientist* decided to publish instead of what Harris wrote. And then you’ll understand why I’ve decided to be a math essayist rather than a math journalist (much as I admire mathematicians who step into the fray).

I can’t say I understand Harris’ essay more than superficially. I’m intrigued by the idea that Scholze’s theory of diamonds allows you to “clone” a prime, but what does that really mean? Maybe if I’d studied Spec(**Z**) back in grad school (or if I took the time to learn about it now) I’d have a clue. And while we’re talking about Spec(**Z**) (or rather talking about not talking about it), I’ve always wondered what diophantine algebraic geometers mean when they say primes are like knots; I hope I’ll understand this someday!

Frank Quinn wrote an essay that has a very nice passage about the role of definitions in modern mathematics:

Definitions that are modern in this sense were developed in the late 1800s. It took awhile to learn to use them: to see how to pack wisdom and experience into a list of axioms, how to fine-tune them to optimize their properties, and how to see opportunities where a new definition might organize a body of material. Well-optimized modern definitions have unexpected advantages. They give access to material that is not (as far as we know) reflected in the physical world. A really “good” definition often has logical consequences that are unanticipated or counterintuitive. A great deal of modern mathematics is built on these unexpected bonuses, but they would have been rejected in the old, more scientific approach. Finally, modern definitions are more accessible to new users. Intuitions can be developed by working directly with definitions, and this is faster and more reliable than trying to contrive a link to physical experience.

I’ll end by quoting Peter Scholze himself:

What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly. For example, for a long time, I had some idea of the concept of diamonds. But only when I came up with a good name could I really start to think about it, let alone communicate it to others. Finding the name took several months (or even a year?). Then it took another two or three years to finally write down the correct definition (among many close variants). The essential difficulty in writing “Etale cohomology of diamonds” was (by far) not giving the proofs, but finding the definitions. But even beyond mere language, we perceive mathematical nature through the lenses given by definitions, and it is critical that the definitions put the essential points into focus.

*Thanks to Sandi Gubin for help with this piece.*

Next month: The Null Salad.

**ENDNOTES**

#1. Sometimes Euclid’s definitions and axioms also hinge on unstated assumptions whose tacit role only came into view many centuries later, but that’s another story.

#2. The distinction between “a” and “the” came to my attention many years ago when I was touring the parts of the Mormon Tabernacle that are open to the public, and the tour guide said “The people who settled Utah were hard-working people; that’s why we call Utah a beehive state.” I asked her whether she meant “the”, since after all Utah is often called *The* Beehive State, but she said she meant “a”. I think she chose the indefinite article to focus her listeners on what kind of people Utahans were and are, rather than inviting comparisons between Utahans and non-Utahans.

#3. Following up on the definition of fnords as special kinds of foos, there might be other theorems, such as “If *F*_{1} and *F*_{2} are fnords, then so is *F*_{1}+*F*_{2}” (assuming that addition of foos has already been defined). A happy asymmetry comes to the aid of someone trying to prove such a theorem: since *F*_{1} and *F*_{2} are (by hypothesis) fnords, each of them satisfies *all three* of the magic fnord-properties, so all three properties may be legitimately assumed; but to prove that *F*_{1}+*F*_{2} is a fnord too, it suffices to prove *just one* of the properties, since the other two come along for free, thanks to the Theorem.

#4. It hasn’t escaped my attention that, in a sense, letting the needs of mathematicians dictate the definitions mathematicians use is not entirely different from the way people let their verdicts on issues determine the definitions they use. People who condone abortion will define life one way, while people who condemn it will use a different definition. In a similar way, number theorists who value the unique factorization of numbers into primes will want to deny primeness to 1, while number theorists who couldn’t care less about unique factorization will — wait a minute, there are no number theorists like that! At least, none that I know of. The fact that number theorists call this result the Fundamental Theorem of Arithmetic tells you right away that there’s unanimity on that point. But, hypothetically, if there were a community of mathematicians who wanted to consider 1 to be prime, it wouldn’t cause a huge rift; we’d just need to introduce a second term, maybe “prome” or “primish”, to carry the variant meaning.

#5. The recently deceased mathematician John Tate, who laid much of the early groundwork for Scholze’s work over half a century ago in one of the most revolutionary doctoral theses of all time, was glumly resigned to the difficulty of conveying to non-mathematicians what he did for a living or why it mattered. His obituary quotes him as saying:

Unfortunately it’s only beautiful to the initiated, to the people who do it. It can’t really be understood or appreciated on a popular level the way music can. You don’t have to be a composer to enjoy music, but in mathematics you do. That’s a really big drawback of the profession. A non-mathematician has to make a big effort to appreciate our work; it’s almost impossible.

#6. I’m not actually aware of any mathematician making a crucial discovery during their honeymoon, but I’d bet it’s happened; I only hope that the mathematician had the restraint to wait until the end of the honeymoon before starting to write it up.

**REFERENCES**

Gilead Amit, “The shape of numbers”, posted as “‘Perfectoid geometry’ may be the secret that links numbers and shapes“, April 25, 2018.

Kevin Buzzard, “A computer-generated proof that nobody understands”, posted July 6, 2019.

Michael Harris, “Is the tone appropriate? Is the mathematics at the right level?”, posted around June 1, 2018.

Michael Harris, “The perfectoid concept: Test case for an absent theory“.

Frank Quinn, “A Revolution in Mathematics? What Really Happened a Century Ago and Why It Matters Today”, Notices of the American Mathematical Society, January 2012.

]]>

First I’ll play nice. I’m thinking of an infinite binary sequence that begins 0101101010110101… How will you guess the next bit? My infinite sequence happens to repeat with period seven, but if you didn’t know that ahead of time, what sort of bit-prediction method would you use? More importantly, how would you get a computer to predict successive bits and learn from its mistakes? This kind of question is relevant to data-compression.

A simple-minded but curiously effective general procedure for predicting the next bit involves looking at the different-length *suffixes* of the currently-known part of the sequence, where the suffix of length *k* consists of the last *k* bits, and looking at earlier occurrences of those particular patterns earlier in the sequence. For instance, the suffix of length 3 in 0101101010110101 is the pattern 101. This pattern of bits has occurred earlier in the sequence (several times, in fact). A longer suffix is 0101, which has also occurred earlier in the sequence. The curiously effective procedure for predicting the next bit requires that you first identify the *longest* suffix that has occurred earlier in the sequence. That suffix happens to be 010110101:

0101101**010110101 **(suffix of length 9)

**010110101**0110101 (same pattern, seen earlier)

Call the longest previously-seen suffix S. The curiously effective procedure prescribes that, having found S, you must locate the previous (i.e., second-to-last) occurrence of S. Then you must see what bit occurred immediately after the previous occurrence of S:

**010110101**__0__110101

In this case, that bit is a 0, so you guess that the next bit will be 0.

0101101**010110101**__0?__

This guess is right, and that’s no accident: as long as I’ve picked a *periodic* sequence of bits (and I did), your use of this simple-minded method guarantees that you’ll guess all bits correctly from some point on, even if you don’t know what the period is.

Okay, now it’s time for me to stop playing nice. I’m thinking of an infinite sequence whose first sixteen terms are 0100110101110001. (Spoiler: it’s called the Ehrenfeucht-Mycielski sequence, and you can watch me construct the first dozen terms in a Barefoot Math video I just posted.) What’s your guess for the seventeenth term? The longest suffix that’s occurred previously is 001,

01**001**10101110**001**

and the previous occurrence of 001 was followed by a 1,

01**001**__1__0101110001

so if you follow the procedure described above you’ll guess that the next bit is 1.

0100110101110**001**__1?__

“Wrong,” I say; “the next bit is 0. This really isn’t your day; that’s the seventeenth straight time that you’ve been wrong!”

I’m being mean to you, but I’m not changing my mind about the sequence as I go; what I had in mind from the start was the infinite sequence of bits that will make all your guesses wrong. (I can do this because I know what prediction method you’re using; this enables me to front-load all my meanness and just go on autopilot after starting the game.) This sequence was invented by Andrzej Ehrenfeucht and Jan Mycielski in 1992, and is described in a nice 2003 article by Klaus Sutner. (I’ve cut some corners in my explanation; to really do it properly, we need to allow the empty string to be considered the “suffix of length 0”. The Wikipedia page gives a more rigorous treatment.)

It’s believed that the sequence exhibits “normality“: that is, it’s believed that half of the bits are 0s and half are 1s, that the four patterns 00, 01, 10, and 11 each appear a quarter of the time, that the eight patterns 000, 001, …, 110, and 111 each appear one-eighth of the time, etc. What’s more, the sequence seems to have distinct eras, where during the *m*th era the sequence is “trying” to make sure that each of the 2* ^{m}* different possible patterns of length

Or at least, the discrepancy *appears* to be smaller. Sutner’s simulations ran for millions of steps, and it’s possible for us to go farther now, but mere calculation can never tell us what really happens out near infinity where the trains don’t run. Mathematicians believe that for all large enough *n*, the number of 0’s in the first *n* bits of the Ehrenfeucht-Mycielski sequence differs from *n*/2 by less than sqrt(*n*) divided by one million. In fact, they believe that the preceding sentence remains true if you replace “million” by “billion” or any larger number you like (though the meaning of “large enough” will need to be adjusted accordingly). But: not only have they *not* proved this, they don’t even know how to prove that this claim is true if sqrt(*n*)/1,000,000 is replaced by the much larger number *n*/1,000,000. They haven’t proved that the asymptotic density of 0’s (or 1’s) is 1/2. This is the notorious Ehrenfeucht-Mycielski balance problem, and it’s been an open problem for over twenty-five years.

The Ehrenfeucht-Mycielski sequence exhibits the phenomenon of quasirandomness, weaker than pseudorandomness but still quite interesting. One of the frustrations of the study of quasirandom processes is the persistent gap between what we can guess and what we can prove. As Paul Erdős said of the Collatz Conjecture, “Mathematics may not be ready for such problems.”

But that’s a defeatist attitude, and I want to end on a positive note. So let me announce here that, after much work, it’s been shown by Kieffer and Szpankowski that the asymptotic density of 0’s and 1’s in the Ehrenfeucht-Mycielski sequence, if it exists, must lie between 1/4 and 3/4.

Oh, so you think it must be easy to prove that the density exists? Guess again.

*Thanks to Bill Gasarch, Cris Moore, Joel Spencer and Klaus Sutner.*

Next month: Let Us Define Our Terms.

**REFERENCES**

“On the Ehrenfeucht-Mycielski Balance Problem”, John C. Kieffer and W. Szpankowski, https://dmtcs.episciences.org/3542/pdf

“The Ehrenfeucht-Mycielski Sequence”, Klaus Sutner; http://www.cs.cmu.edu/~sutner/papers/CIAA-Sutner-2003.pdf

]]>