I’m sure you’ve counted (“One, two, three, . . . ”) on too many occasions to count. The process can be boring (counting sheep), exciting (counting your winnings at a casino), or menacing (“If you kids aren’t at the dinner table by the time I reach ten, I’ll …”). But one thing counting is *not* is liberating. What could be less free than the inexorable succession of the counting numbers? And yet the very regularity of counting numbers gives us the freedom to think about them in multiple ways, arriving at conclusions along delightfully varied paths.

Consider the classic problem of adding all the numbers from 1 up to 100. The obvious method of computing the sum takes a long time, which is why (according to a legend that may or may not be true) a certain schoolteacher in Germany a few centuries ago asked his students to work it out on their slates; he wanted to buy himself a bit of peace. Unfortunately for him, one of his students was the young Carl Friedrich Gauss, future doyen of European mathematics, who knew even then that when you add up a bunch of numbers, the order in which you add them doesn’t matter. That regularity gave Gauss the freedom to add them in a different order, peeling off the numbers from both ends of the list in alternation:

1+100+2+99+3+98+…+49+52+50+51

Pairing up the numbers two by two as

(1+100)+(2+99)+(3+98)+…+(49+52)+(50+51),

Gauss quickly saw that the answer was 101 + 101 + 101 + … + 101 + 101, or 101 × 50, and astonished his teacher by writing “5050” on his slate before even a minute had passed.^{1}

Gauss wasn’t the first person to figure out how to add the numbers from 1 up to *n*; the ancient Greeks (and probably other civilizations whose mathematical ideas weren’t as amply recorded or don’t get as much attention) knew that the sum is always half of the product of *n* and *n*+1. The way they proved it was by cutting an *n*-by-(*n*+1) array of dots into two triangles, as shown below for *n* = 10:

The triangular region to the left of the diagonal, read by rows from top to bottom, has 1+2+…+10 dots, and the triangular region to the right of the diagonal, read by rows from bottom to top, also has 1+2+…+10 dots. So, taking inventory of all the dots in the 10-by-11 rectangle, we see that twice 1+2+…+10 must equal 10 times 11, which implies that 1+2+…+10 must equal half of 10 times 11.

The same reasoning shows that for any counting number *n*, the sum 1+2+…+*n* must equal *n*(*n*+1)/2. For* any* counting number *n*. No matter how big! The fact that we can know this is a pretty amazing thing when you stop to think about it. There are bigger numbers than we can ever count to, bigger numbers than we could ever write down, bigger numbers than we will ever imagine with our finite brains – yet our argument shows that no matter how big *n* is, there’s a relationship between the value of *n* and the value of the sum of all the counting numbers up to *n*.^{2}

**MATHEMATICAL INDUCTION**

Let’s look at a different proof. It won’t give the same jolt of insight that you get from looking at the picture on the previous page, but the method scales up to tackle harder problems (like showing that 1^{4} + 2^{4} + . . . + *n*^{4} = *n*(*n*+1)(2*n*+1)(3*n*^{2}+3*n*−1)/30, say) in a way that the geometrical approach doesn’t.

Let’s start by examining the proposition 1+2+3+…+99+100 = 5050 and the proposition 1+2+3+…+99 = 4950. Please forget for a moment that you already know that the former proposition is true, because that will distract you from the subtler point I’m trying to make. Let’s give these propositions names. Let *P* be the proposition “1+2+3+…+99 = 4950” and *Q* be the proposition “1+2+3+…+99+100 = 5050”. (Note that *P* and *Q* are not numbers; they’re assertions of numerical equality.) I claim that the left-hand side of *Q* is 100 more than the left-hand side of *P* and that the right-hand side of *Q* is 100 more than the right-hand side of *P*. Check it out:

*P* : 1+2+3+…+99 =4950

*Q*: 1+2+3+…+99+100=5050

The left-hand side of *Q* is just like the left-hand side of *P*, except that there’s an extra 100; and the right-hand side of *Q* is 5050, which is 100 more than 4950. So *Q* is just *P* with 100 added to both sides, and *P* is just *Q* with 100 subtracted from both sides. *P* and *Q* either stand together or fall together. They’re either *both* true or *both* false.

You may wonder: Where am I going with this? All the way down to 1, is where! (That’s as low as we can go; I’m not considering zero to be a counting number.) For each counting number *n* between 1 and 100, let *P _{n}* be the proposition 1+2+. . .+

Have I proved *P*_{100} yet? Not quite. But I’ve shown that you that *P*_{100} and *P*_{99} are either both true or both false, and that *P*_{99} and *P*_{98} are either both true or both false, and so on, ending with the claim that *P*_{2} and *P*_{1} are either both true or both false. So it’s a package deal: you must either believe all one hundred of these assertions or disbelieve all hundred of them.

And now for the punchline: go back and look at *P*_{1} again. It asserts merely that 1 = (1)(2)/2, which I’m sure you believe. So you must buy the whole package, and assent to all one hundred of the propositions, including *P*_{100}, the one we were interested in to begin with.

Note that to make the argument work we didn’t actually need to know that for each *n* the propositions *P _{n}*

But why stop at 100? The same reasoning applies to larger numbers too. For every counting number *n*, if *P _{n}* is the proposition that 1+2+…+

Warning: Even though there are infinitely many counting numbers, you shouldn’t get the idea that infinity is itself a counting number. It isn’t. The infinite stairway has a bottom marked 1, but it doesn’t have a top marked ∞. Finite staircases always have a top tread, but our fictional infinite staircase doesn’t. Some find this wonderfully odd while others find it disturbing. Indeed, some mathematicians think that because the human mind is finite, we need a radically finite mathematics that banishes the infinite. These are the finitists, who want us to view the infinite stairway not as a completed thing (not even a fictional one) but as a blueprint for a structure that can never be completed. The radical wing of finitism is ultrafinitism, which asserts that really really really big counting numbers don’t exist.

My own prediction, based on what I know of mathematical history, is that mathematics, with its track record of expanding to accommodate different philosophies of mathematics, will eventually build a big enough tent to house the ultrafinitists. But I also predict that ultrafinitistic proofs will be more complicated than their infinitistic counterparts and will be very difficult to understand for those who lack a grounding in infinitistic mathematics. By way of analogy, consider the way we teach Newtonian physics as a prologue to Einsteinian physics; the former is just an approximation to the latter, but it’s hard to understand the truer relativistic theory without understanding its less-true non-relativistic precursor. So the role played by the infinite stairway in the philosophy of mathematics may change in the coming centuries, but it is not likely to be supplanted as a mental image for the working mathematician or for the student learning mathematics.

**GET OUT YOUR CRAYONS**

If all this talk about sums and propositions and truth seems too abstract and colorless, here’s a down-to-earth way to think about induction via a coloring game.

I write down the numbers from 1 to *n* in a row (in the picture I’ve chosen *n* = 10), I mark the number 1 with a smear of blue crayon and the number *n* with a smear of red crayon and then I hand the blue crayon to you.

After that, we’ll take turns making marks on not-yet-marked numbers, with you marking numbers blue and me marking numbers red. The game ends when there are two consecutive numbers (call them *k*−1 and *k*) with *k*−1 marked blue and *k* marked red (call this a “blue-red pair”); at that instant, whoever just moved (creating the blue-red pair) loses instantly. Blue-red pairs are forbidden but red-blue pairs are allowed; if *k*−1 is marked red and *k* is marked blue, play may continue.

Perhaps you’re wondering, what happens when there aren’t any not-yet-marked numbers and nobody’s lost the game yet? Perhaps you should try to construct a line of play that ends in a draw before you read further.

A famous theorem called Sperner’s Lemma tells us that a draw can’t happen. Specifically, the assertion that a draw can’t occur in our game is the 1-dimensional case of Sperner’s Lemma. (Sperner’s Lemma is usually discussed only in 2 dimensions and higher, where it’s far more interesting.) We can prove this by induction. We know that 1 is blue, so if 2 ever gets colored red, a blue-red pair is formed and somebody loses. So a draw can only happen if 2 is colored blue at the end of the game. What about 3? The same reasoning applies: we’ve shown that if there’s a draw, 2 must be colored blue at the end of the game, and is 3 is colored red, then a blue-red pair is present and the game wasn’t a draw after all. And so on. Ultimately, we reach *n*−1, and show that it too must eventually be colored blue. But at the instant that that happens, *n*−1 and *n* will form a blue-red pair, and somebody (namely the blue player, namely you) loses. So a draw is impossible.

Notice that we can state this result in a less combative way: regardless of whether the players compete or collaborate, there’s no way to color the numbers 1 through *n* so as to simultaneously satisfy the constraints (a) 1 is blue, (b) *n* is red, and (c) there are no blue-red pairs. The three conditions are incompatible.^{5}

The reason I’ve taken this detour is that the fact that we just learned about the Sperner game (to wit, that conditions (a), (b), and (c) are incompatible) isn’t just an application of induction; you can turn things around and use the incompatibility result to *prove *the principle of mathematical induction!

Suppose we have some propositions *P*_{1}, *P*_{2}, *P*_{3}, … and we’d like to prove that they’re all true. Furthermore, suppose that *P*_{1} is true, and suppose that whenever *P _{k}*

[Hey readers: Did you like this section? It’s a bit of an unusual take on induction. Was it helpful or was it distracting? Let me know in the Comments!]

**BANISHING PHANTOMS**

Mathematical induction is great for proving that certain things always happen, but it can also be used to show that certain things *never* happen. (This shouldn’t be surprising, though, since Never is just another kind of Always.)

Say you want to draw a regular octagon on graph paper, like this:

This first effort isn’t bad, but it’s fairly evident that the horizontal and vertical sides are slightly longer than the diagonal sides. Can we do better? For that matter, why settle for merely *better*: can we do it *perfectly*?

What we are looking for are numbers *a* and *b* to replace 2 and 3 in the picture that will give us a regular octagon. That is, we want whole numbers *a* and *b* with the property that the hypotenuse of an isosceles right triangle with both its legs of length *a* has length equal to *b*. By the Pythagorean theorem, this is equivalent to the equation 2*a*^{2} = *b*^{2}. In other words, we are looking for a perfect square (*a*^{2}) which when doubled (2*a*^{2}) equals another perfect square (*b*^{2}).

So, you charge up the infinite stairway, looking for a counting number *a* with the property that 2*a*^{2} is a perfect square. Surely you’ll find one; in an infinite universe (so goes the cliché) everything you can imagine is bound to happen somewhere eventually, and the stairway is infinite, so surely you’ll eventually find the object of your quest!

Onward, past one million. You haven’t found such an *a* yet, but remember, success favors the bold, not the quitter. Onward, past one billion. Don’t give up now! You’ve invested so much in the search; why throw in the towel and throw all that effort away? Onward, past one trillion. Ignore all the nay-sayers, including the ones in your own head. Believe in yourself! Keep going! …

Well, no — please don’t. You are chasing a phantom; no such number *a* exists. One of the most amazing things about the stairway is that it’s possible for us to know, beyond doubt, that certain number-properties that we can formulate (such as the property “2*a*^{2} is a perfect square”) are not satisfied by any counting number *a* whatsoever. We don’t prove this by conducting an exhaustive survey of the stairway, which the human mind can’t do. Instead, we make use of a curious asymmetry of the stairway: you can ascend forever and never hit an obstacle, but any downward trip in the stairway must eventually end.^{6}

I wrote about the method of proof by descent in Reasoning and Reckoning so I won’t give the full details in this essay. But I’ll summarize here one of the conclusions I established there, which is that if *a* is a counting number with the property that 2*a*^{2} is a perfect square, then *a*/5 (call it *a*′) is a smaller counting number with the property that 2*a*′^{2} is a perfect square. Applying this same argument a second time, we find that *a*′/5 (call it *a*′′) is an even smaller counting number with the property that 2*a*′′^{2} is a perfect square. And so on, ad infinitum. We get an infinite sequence of ever-smaller counting numbers *a*, *a*′, *a*′′, … , each with the property that its square, doubled, is a perfect square. But wait a minute: how can we have an unending sequence of counting numbers, each smaller than the one before? There’s no such thing! So we conclude that there’s no such number *a*. It was never more than a phantom.

Here’s a more geometrical way to banish the phantom from the infinite stairway, discovered by Joel Hamkins. Draw an octagon with corners labeled *A* through *H*:

We can draw a new octagon by swinging the edges through 90 degrees about an endpoint. For instance, we swing edge *AB* 90 degree clockwise about *A*, and call the new point *A*′; we swing edge *BC* 90 degree clockwise about *B*, and call the new point *B*′; and so on, around the octagon.

Here are the two things to notice: (a) if *A* through *H* are grid-points, then *A*′ through *H*′ must be grid-points as well; and (b) if *AB*···*H* is a regular octagon, then *A*′*B*′···*H*′ is a regular octagon. So if our original octagon had both properties, we can repeat the process as many times as we like, obtaining ever-smaller regular octagons with corners in the square grid. But all these octagons have side-lengths equal to counting numbers, so we get a sequence of ever-smaller counting numbers, which we know is impossible.

This method of argument was a favorite of Pierre Fermat’s. He used it for instance to prove the *n*=4 case of what is now called Fermat’s Last Theorem. Specifically, he showed that there don’t exist positive integers *x*, *y*, *z* satisfying *x*^{4} + *y*^{4} = *z*^{4}. In fact, he used the method of descent to prove something stronger: there don’t exist positive integers *x*, *y*, *z* satisfying *x*^{4} + *y*^{4} = *z*^{2}. He showed that if there were a phantom solution, there’d be a smaller phantom solution, and a smaller phantom solution, and so on, ad infinitum, which is impossible, since there cannot be an infinite sequence of ever-smaller counting numbers.

So you could say that the way to banish phantoms from the infinite stairway is to kick them down the stairs!

**UP THE DOWN STAIRCASE**

If the preceding results strike you as having a depressing vibe (“no way”; “can’t be done”; “impossible”; “don’t waste your time trying”), you’ll be glad to learn that the downward impossibility results can sometimes be flipped into upward *possibility* results.

Let’s look again at the picture of the two nested octagons and follow the action more carefully. The big octagon is determined by the two numbers 5 and 7 (the horizontal displacement from *A* to *B* is 7 and the horizontal displacement from *B* to *C* is 5), and the reason the octagon is so close to regular is that twice 5^{2} is very close to 7^{2}. Likewise, the small octagon is determined by the two numbers 2 and 3 (the horizontal displacement from *A*′ to *B*′ is 3 and the horizontal displacement from *B*′ to *C*′ is 2), and the reason the octagon is somewhat close to regular is that twice 2^{2} is somewhat close to 3^{2} . The recipe for getting from *a* = 5, *b* = 7 to *a*′ = 2, *b*′ = 3 is to take *a*′ = *b*−*a*, *b*′ = 2*a*−*b*.

But this method of descent, as I promised you, has a flip side: a method of ascent that let us create infinitely many near-misses that come close to solving the original problem, and lets us systematically create ever-better grid-approximations to a regular octagon. It’s the descent process, run in reverse: *a* = *a*′+*b*′, *b* = 2*a*′+*b′*. Or if we prefer, *a* = *b*′+*a*′, *b* = *a*′+*a*. Here’s a graphical depiction of the process:

The picture is made of little number-snippets (1,1,2,3; 3,2,5,7; 7,5,12,17; 17,12,29,41, etc.) arranged in three-quarters circles. In each snippet, the third number is the sum of the first and second, and the fourth number is the sum of the second and third. If we wanted to continue the pattern, we’d have 41+29=70 at the top right and 29+70=99 beneath it. We get infinitely many pairs *a*, *b* satisfying 2*a*^{2} − *b*^{2} = ±1. Although the discrepancy between 2*a*^{2} and *b*^{2} never goes below 1 in absolute magnitude, in relative magnitude (compared to *a* and *b*, which keep growing exponentially) the discrepancy is getting exponentially smaller.

The Indian mathematician Brahmagupta knew all this, and he came up with an even faster method of getting really good approximations by combining two known approximations. Specifically, he discovered that if *u*^{2}−2*v*^{2} = ±1 and *w*^{2}−2*x*^{2} = ±1, then putting* y* = *uw*+2*vx* and *z *= *ux*+*vw* we get a new solution *y*^{2} − 2*z*^{2} = ±1. In a later essay, I’ll explain why this seemingly miraculous way of building new solutions from old is perfectly sensible and unsurprising when viewed through the lens of algebraic number theory.

**THE BIGGEST MYSTERY**

There are many odd features of this imaginary stairway, equipped with a bottom but no top. One is the futility of attempting to climb it. There may be an illusion of progress, but no matter how many treads we surmount in absolute terms, we have made, in relative terms, no progress at all. For no matter how far we’ve come, the part of the stairway that we have passed is finite, while the part that remains before us is infinite; compared to what lies ahead, what lies behind is negligible. You’re always just beginning your journey; in relative terms, you never get off that first tread.

And yet we can *know* things about the number-stairway, and know them with certainty, even facts that pertain to parts of the stairway we’ll never visit. We can say, for instance, that the far reaches of the stairway contain infinitely many numbers *a* for which 2*a*^{2} − 1 is a perfect square, yet none for which 2*a*^{2} is a perfect square. The way in which the infinite stairway combines knowability and unknowability is part of its allure.

But for me, the most wondrous thing is the way the rigidly one-dimensional stairway, refracted through the human mind, kaleidoscopically unfolds into something more like a landscape than a corridor. This is a landscape not of numbers but of knowledge. The facts of math are not arranged in a line, but rather lie scattered about, and we must arrange them into patterns and then organize the patterns into some sort of story. Facts fall into networks bound together by filaments of logic, and these networks communicate with other networks, forming larger networks which are parts of even larger networks. Indeed, to know the facts beyond doubt, we must construct the sorts of stories that are called proofs. Proofs are a bit like the stairway, in that they are linear, and following the links in a chain of deductions can be a bit like climbing the stair, but devising a proof that works is quite different. The landscape of what’s true doesn’t come with a map, let alone an itinerary, and sometimes the shortest path leading us from the things we already know to the thing we want to know is extremely circuitous, and requires wandering far from where anyone has journeyed.

The staircase is linear but human thought is not. Sometimes intuition leaps over steps in a proof and then fills in those steps after the fact; Gauss himself once said “I have had my results for a long time but I do not yet know how I am to arrive at them.” But for him, a proof was more than a certificate of truth; it was also a source of understanding. A bad proof can tell you *that* something is true without telling you *why* it’s true, and even a good proof can fail to satisfy; mathematicians often want multiple proofs that illuminate some aspect of mathematical reality from various angles. Gauss himself sought proof after proof of the fundamental theorem of algebra because he wanted to understand it deeply. The quest for understanding is more central to mathematics than the quest for mere certainty.

In an earlier, more fanciful draft of this essay, I wrote: “We do not know Who built the stairway, but They did not build it for us.” But I don’t really believe in such a Them, so it seems disingenuous to try to raise readers’ goosebumps in this cheesy fashion.

Still: if there exist beings of infinite mind outside our physical universe who can encompass the stairway and all that it contains, they must have a very different relationship to mathematical truth than we do. They need no proof that counting-number solutions to 2*a*^{2} = *b*^{2} don’t exist; they just see it at a glance, by surveying all counting number at once. For us, *P*_{3} was true *because* *P*_{2} is true *because* *P*_{1} is true. For them, *P*_{1} is true *and* *P*_{2} is true *and* *P*_{3} is true, just because they’re self-evident. There’s a huge leveling, and a huge loss, when all mathematical facts are equally transparent. In a way I envy such mathematically omniscient beings, but in way I pity them. They’re missing out on the stories that connect the facts, and the struggle to construct such stories when one has a finite mind and only a partial understanding of the landscape. For me, the human struggle to know what’s true, and why it’s true, is what gives the quest for mathematical knowledge its drama, its dignity, and its joy.

*Thanks to Sandi Gubin.*

**ENDNOTES**

#1. Some skepticism about this anecdote is in order here. Also, I don’t like the way it reinforces the genius myth. For more on the Gauss story, see my essay Reasoning and Reckoning. The essay you’re reading now can be viewed as something like a second draft of that earlier essay. For more on the genius myth, see my essay The Genius Box.

#2. The pioneering 20th century neuroscientist Warren McCulloch, whose ideas about neurons and computation foreshadowed advances in our own century, decided when he was young that he would devote his life to the two-part question “What is number, that man may know it, and what is man, that he may know a number?” We still don’t have a good answer.

#3. The right-hand side of *P _{n}* is (

#4. The principle of mathematical induction has many variants that are equivalent to it, so if you’re thinking “Wait, the induction step means showing that *P _{n}* implies

#5. I think the game is interesting, so if any of you know anything about it, either because it’s already been studied by others or because you played around with it and figured some things out, please let me know in the Comments!

#6. Let *P _{n}* be the proposition that any downward trip that starts with

#7. The pictures above show octagons with horizontal and vertical sides, but the argument also works for canted octagons like this:

The conclusion is that you can have a regular octagon, and you can have an octagon whose corners are on the square grid, but you can’t have an octagon that achieves both feats at the same time. A variant of this arguments works for grids in *d* dimensions for all *d* > 2. So if you’re one of those people who thinks that our seemingly continuous physical world is actually made up of little cubes analogous to the pixels that constitute digital photographs, then no regular octagons for you!

Magic paper helps me with some problems that have long bedeviled classroom teachers like myself: How do you find out what’s going on inside your students’ heads in the midst of a lesson without derailing it? How do you get all your students to actively participate without having the class descend into chaos? How do you communicate with a large group of students without the conversation devolving into what math educator Henri Picciotto calls a “pseudo-interactive lecture” dominated by the teacher and the two or three most vocal students?

Back in the 1980s, educator David R. Johnson tackled these problems using what he called the “paper and pencil method” of getting real-time feedback on how students are doing. Under this model, the teacher asks a question, the students write down their answers, and the teacher sees what the students wrote. This was back before magic paper, so the teacher would have to physically move around to look at the students’ responses. To make the moving-and-looking more expeditious, Johnson suggested that teachers seat students in a U-shape arrangement with the teacher stationed at the center. For more on Johnson’s ideas, see his book *Every Minute Counts* (1982, Dale Seymour Publications).

Around that same time, educator William F. Johntz pioneered an initiative called Project SEED that I had the good fortune to be exposed to while in graduate school; see the essay about it on Henri’s Picciotto’s math education blog.^{1} SEED had an innovative and charming way of opening up an underused communication channel from the class to the teacher. In SEED classes, hand signals played a big role: hand signals for numbers, for operations, for equality and inequality; hand signals for agreement, disagreement, partial agreement, and confusion. A teacher could ask a class a question (plant a seed, if you will) and quickly reap a rich visual harvest of information, a panoramic representation of her students’ states of mind. This provided SEED teachers with even snappier feedback than Johnson’s paper-and-pencil method, though with some limitations (there are after all only so many hand signals you can teach a class^{2}).

In the 1990s I turned my efforts wholeheartedly towards mathematical research, with teaching as a side activity that I tried to perform competently and compassionately but which didn’t arouse my highest passions. I read what people like Sheila Tobias and Alan Schoenfeld and Uri Treisman and Liping Ma were writing, and some of their ideas affected my teaching, but mostly I taught my students in the same ways that I had been taught. In particular, when I asked a question, I waited until a reasonable number of hands were raised (or until I gave up on waiting for more hands to go up; I never felt comfortable cold-calling students). I would pick someone whose hand was raised (trying to pick whichever of the hand-raisers had spoken up the least so far that day), and then respond to that person as if The Class had just spoken to me through its Chosen Representative. But of course the students who spoke up weren’t representative of the class as a whole.

Fast forward a few decades to the Covid-19 pandemic. Suddenly I was teaching over Zoom with very little relevant experience. As my time permitted, I took some online classes in how to do online teaching, and one of the tricks I learned was Chat-Storming. I quickly grew enamored of it. In Chat-Storming, I ask a question and none of the students answer right away, because I’ve told them not to. Instead, students compose answers in the Chat field of their Zoom portal but don’t press Enter/Return until I say “Okay, submit your answers.”

Then a flood of feedback drops down on my head as all the students answer at once. If it were an auditory overlay of student responses, it would just be a roar of white noise, but it’s visible, searchable, interpretable. I can’t tell which students were quick and which were slow, but I can see at a glance what the students as a whole think, and also look at individual answers in as much detail as I wish. It’s as easy to visually scan the magic paper (in this case, Zoom’s chat log) as it is to scan the hand signals in a Project SEED classroom, and the responses have a higher information content. Chat Storms use the visual information channel William Johntz championed, but with more bandwidth. As a bonus for the teacher, students aren’t able to peek at each other to try to assess whether their instincts are right or wrong based on whether the apparently “top” students agree with them; they’re on their own, and must make up their own minds.

Now I’m back in the physical classroom again, but I still use Chat Storms because they’re the best way I know to create in-class engagement that also can be used for assessment of participation^{3} and gives me realtime feedback on what students understand and what they don’t.^{4}

Does anyone know who came up with the Chat Storm? If you have any leads, please share them in the Comments!

Among its other virtues, the Chat Storm taps into the seldom-utilized positive power of boredom. In the past, I would sometimes force a class to speak by using the brutal tactic of not saying anything. If a teacher pauses for long enough, someone will break the awkward silence, once the students realize that you’re willing to wait as long as it takes. Chat Storms do something similar. When a Chat Storm is going on, the classroom is a really boring place. Nothing is going on except a lot of people thinking and writing on their magic paper. If you’re a student in such a classroom, you’ll quickly realize that nothing interesting is going to happen; you might as well join in the thinking and writing.

Well, maybe that’s not entirely true. If you’re stuck in a silent classroom with nothing but a smartphone, there are approximately infinity things^{5} you can do on your phone besides participating in a Chat Storm. And indeed some students who are prone to being distracted by their phones (such as students with ADHD) have told me they prefer a more traditional style of teacher-student interaction. But the majority of students like what I’m doing; they find that Chat Storms are enjoyable, keep them engaged, and provide feedback on how they’re doing.

“Excuse me, professor: could you give an example of a Chat Storm?” (I hear a reader of this essay ask).

Excellent question! I’m so glad you asked that!

When I assigned my discrete mathematics class the task of forming the logical negation of “Everybody’s a critic” using the Chat Storm format, I got a few expected wrong answers of the form “Nobody’s a critic” but also a couple of instances of a wrong answer I hadn’t expected: “At least one person is a critic.” This led to an unplanned discussion of the meaning of negation that I hadn’t realized the students needed, and I enunciated a criterion I hadn’t thought of before: if you can imagine a universe in which *p* and *q* are both false, *or* a universe in which *p* and *q* are both true, then *p* and *q* are not negations of each other. The students found that criterion helpful; I think I’ll teach it on purpose in the future.

Here’s another example from my recent teaching: when explaining the basics of set theory I arranged a Chat Storm in which I solicited mnemonics for keeping ∪ (union) and ∩ (intersection) straight. In the past I have pointed out to students that ∪ looks like the U in the word Union, and that by a process of elimination the other symbol ∩ can be deduced to mean intersection (the “other thing”). Some of the students had the same mnemonic I’d come up with. But one suggested a mnemonic I hadn’t seen before, pointing out that ∩ resembles the lower-case “n” in “intersection”. I think that’s a keeper too!

I happen to use Zoom and Chat but I know of people who use other kinds of magic paper: Miroboards, Mentimeter, Mattermost, and others. There’s also polling software, but polls often need to be prepared in advance, and many polling systems only allow multiple choice. One thing I like about Chat is its spontaneity (I can whip up a Chat Storm on the whim of a moment) and its open-endedness (if a student wants to include a joke or comment in their answer the system will permit this expression of their individuality).

But going back to what I wrote in the first sentence of this essay, the fact is, the year 2022 *is* bad science fiction, just as 2020 and 2021 were; if I’d prophetically written an acurate account of the pandemic back in the 1970s and tried to sell it as a novel, no publisher would have touched it. (“Dear Sir: This blatant *Andromeda Strain* ripoff somehow manages to be scientifically over-detailed, politically implausible, mind-numbingly boring, and deeply depressing all at the same time.”) Yet the ongoing viral storm has had some silver linings. The best one is the advent of mRNA vaccines, which are likely to have marvelous applications to improving people’s health in years to come. But somewhere down my private list of silver linings I’d put this new way of engaging my students.

So if you walk by my classroom and see my students glued to their phones while I’m standing by silently, looking around at students who aren’t looking at me, don’t assume that nothing is happening; a Storm is probably brewing.

*Thanks to Sandi Gubin, Henri Picciotto, and my discrete mathematics students.*

**ENDNOTES**

#1: Project SEED was a wonderful embodiment of the idea that you can set students up to discover deep mathematical ideas for themselves through artfully constructed activities. A teacher might start a SEED class by asking “What do you get when you add an odd number of odd numbers?” and then guiding the ensuing discussion. I could say a lot more about the Project SEED approach to teaching and about the many aspects of it that resonate with me, but I wouldn’t do a better job than Henri Picciotto has already done in his essay.

#2: I wonder how subjects are taught at Gallaudet University. Since everyone there is proficient in American Sign Language, there’d be opportunities for a whole class to respond to a teacher simultaneously in ASL; that might work really well not just for math but for other subjects too.

#3: I’ve written C programs and shell scripts that allow me to quickly determine, using the .txt files Zoom creates, how often each student wrote something in the chat at that particular class meeting. I’m happy to share the programs with anyone who’s interested.

#4: It’s fun to structure a sequence of Chat Storms that allows students to advance their understanding in manageable increments, and deeply satisfying to watch things sink into their brains, as evidenced by rising performance over the course of ten minutes. Of course, sometimes hardly anyone gets the right answer to a question I’ve asked. That means I didn’t design that part of the class well, and that’s never an enjoyable discovery. But the discovery enables me to learn what’s not working and fix it.

#5: There’s a wonderful exhibit at the Boston Museum of Science illustrating just how versatile smartphones are. It’s a replica of a room full of dozens of appliances, every single one of which has been rendered more or less obsolete by the smartphone. Even the stapler? Well, there are apps that have the specific purpose of combining multiple pieces of magic paper into a single magic document. Some of my students use those apps when they submit their homework on Blackboard. So yes, even the stapler.

]]>But first, where did polynomials come from?

**THE ART OF THE THING**

“Thing” is a marvelously flexible word, as are similar words like “*res*” and “*cosa*” that other languages have used to signify unspecified objects. Often the word denotes a group of people who have come together for some purpose: think of the Roman Republic (the “public thing”) or the Cosa Nostra (“Our Thing”). Curiously, the English word “thing” itself seems to have traveled in the opposite direction, starting out as meaning an assembly of people and ending up as meaning, well, any-thing. Math has made its own uses of nonmathematical words for indefinite objects: in Indian and Arabic algebra, the quantity being sought was often called “the thing”. It was natural for European algebraists to borrow this usage, and indeed Renaissance algebra was sometimes referred to as “the art of the thing”. (See Endnote #1.)

Of course polynomials predate Europe; mathematicians around the world used polynomials two thousand years ago, back in the days when math was made of words and algebra was rhetorical (see Endnote #2). But when you think of polynomials you probably think of the way they’re written in modern high schools, which means you’re thinking of the modern, symbolic approach to polynomials pioneered by Rafael Bombelli. I wrote about Bombelli in my essay on complex numbers. His 1572 book *L’Algebra* popularized a version of exponential notation similar to the one we use today and he gave clear rules for how to perform arithmetic operations on polynomials. For instance, here’s an approximate translation of Bombelli’s description of how we can multiply simple polynomials like 3*x*^{4} and 5*x*^{6} (where he refers to powers as “dignities” and exponents as “abbreviatures”):

*When one has to multiply dignities one adds the numbers of the abbreviatures written above, and from those will be formed an abbreviature of dignities, and the numbers that stand below that dignity are simply multiplied.*

That is, to multiply 3*x*^{4} by 5*x*^{6} we multiply the “numbers below” (the 3 and the 5) to get 15, and we add the “abbreviatures” (the 4 and the 6) to get 10, obtaining 15*x*^{10} as the answer.

Polynomial notation gave algebra a new economy of expression, but more important was the new viewpoint that polynomials brought in, relating solving equations to factoring polynomials. Consider the equation *x*^{2} + 2 = 3*x*; it’s not hard to check that *x* = 1 and *x* = 2 are solutions, but how can we be sure that there aren’t more? On the other hand, consider the equation (*x*−1)(*x*−2) = 0. It’s algebraically equivalent to *x*^{2} + 2 = 3*x* (to see why, expand the left hand side of (*x*−1)(*x*−2) = 0 and do some rearranging), but it’s more eloquent in explaining to us why there are no solutions we’ve overlooked. For, if *x* is equal to neither 1 nor 2, then *x*−1 can’t be 0 (because *x* isn’t 1), and *x*−2 can’t be 0 (because *x* isn’t 2), so neither *x*−1 nor *x*−2 can equal 0; but then (*x*−1)(*x*−2), being the product of two nonzero numbers, can’t be 0. In modern algebra, the fact that the product of two nonzero numbers cannot be zero gets promoted from boring truism to organizing principle. In particular, it implies that a polynomial equation of degree *n* has at most *n* solutions.

The word “polynomial” goes back to the writings of René Descartes, who cobbled the word together using the Greek prefix “poly” meaning “many” and the Latin root “nomen” meaning “name”. Here “poly” refers to the fact that a polynomial can be a sum of many terms that feature *x* raised to various powers. When there’s just one term, as in the polynomial *x*^{4}, we refer to the polynomial as a *monomial* (even though “mononomial” would be more apt). When there are two terms, as in the polynomial *x*^{4} + *x*^{3}, we refer to the polynomial as a *binomial*. (See Endnote #3.)

**MAKING A DIFFERENCE**

Polynomials have many important applications in the sciences, but the following party trick is not one of them: Have a friend pick four counting numbers *a*, *b*, *c*, and *n* but **not** reveal them to you; instead, they are to reveal to you the values of the polynomial *ax ^{2}* +

For instance, say they picked *a*, *b*, *c*, and *n* to all equal 1 (but you don’t know that), so they reveal to you the values of the polynomial * x^{2}* +

The first row of this table consists of the three numbers your friend revealed (call them *r*_{1}, *r*_{2}, and *r*_{3}), with spaces reserved for the three numbers you’re going to eventually announce back. The second row gives the differences *r*_{2} − *r*_{1} and *r*_{3} − *r*_{2}, while the third row gives the difference-of-differences (*r*_{3} − *r*_{2}) − (*r*_{2} − *r*_{1}). (Check: 7 − 3 = 4, 13 − 7 = 6, and 6 − 4 = 2.) We say that the second row is the *difference sequence* of the first row (and likewise the third row is the difference sequence of the second row).

Now extend that bottom row by repeating its sole entry three more times:

Next, fill in the second row of the table in such a way that the third row is the difference-sequence of the second row:

(Check: 8−6 = 2, 10−8 = 2, and 12−10 = 2. Of course I didn’t just guess the numbers 8, 10, and 12 at random and happen to be lucky; I calculated them successively as 6 + 2 = 8, 8 + 2 = 10, and 10 + 2 = 12.)

Finally, fill in the first row of the table in such a way that the second row is the difference-sequence of the first row:

(13+8 = 21, 21+10 = 31, and 31+12 = 43.) You announce the numbers 21, 31, and 43, and your friend checks that these are indeed the values of 4^{2} +4+1, 5^{2} +5+1, and 6^{2} +6+1.

And now, if you really want to be impressive, you can race your friend to compute the next five terms. You’ll probably win by just extending your table out five more places, because your arithmetic will be simpler than theirs! It’s possible that you and your friend will get different answers for one of the numbers. Then you can pull out your phone, open the calculator app, and see who’s right and who’s wrong. My guess is that your friend is wrong, because the procedure they’re following involves more steps.

Why does the trick work? Look back at the table. Notice that the second row is an arithmetic progression, with each term increasing by the same amount (namely 2) as one progresses from left to right. I show in Endnote #4 that if you write the successive values taken on by a quadratic polynomial and compute the successive differences, you always get an arithmetic progression. That is, if you list successive values taken on by a polynomial function of degree 2, and you take the differences, those differences will be the successive values taken on by a polynomial function of degree 1. And this trick isn’t limited to polynomials of degree 2; see Endnote #5.

We call the tables we generate in this way difference tables. This game of operating on polynomials to get new polynomials of lower degree (deriving a row of the table from the row above) and reversing the process (deriving a row of the table from the row below) is called the calculus of finite differences, not to be confused with the infinitesimal calculus of Isaac Newton and Gottfried Leibniz. (Incidentally, even before Leibniz and Newton were born, Indian mathematicians in Kerala applied the calculus of finite differences to the sine and cosine functions and obtained the power series expansions of these functions! I’ll tell you this story soon.)

I propose a puzzle you might wish to think about using the ideas of this section: Show that if *p*(·) is a polynomial of degree *d* and if *p*(1), *p*(2), . . . , and *p*(*d*+1) are all integers, then *p*(*n*) is an integer for all integers *n*. (An example of such a polynomial is my “Personal Polynomial” (5/2)*x*^{2} − (17/2)*x* + 16 which I wrote about last month.) For a solution, see Endnote #6.

**PATTERNS IN THE POWERS**

I have no talent for grave-robbing, nor am I consumed by curiosity about the respective roles played by heredity and environment in determining a person’s mathematical ability; but were I so talented, and so consumed, I’d consider digging up the bones of the Bernoullis to scrape together some of their DNA. These eight mathematicians constituted a kind of European mathematical nobility for nearly a century. If there’s anything like a math gene, one might suppose that the Bernoullis had it. (Do historians of science know anything about the women of that family? Given that they shared the heredity and environment of the men, one would expect that they too would have displayed mathematical talent even if they were never given a chance to develop it.)

The Bernoulli numbers — the sequence of numbers 1, 1/2, 1/6, 0, −1/30, 0, 1/42, 0, −1/30, 0, 5/66, … — were named after a member of the Bernoulli family, but he didn’t discover them. Priority belongs to the German weaver-surveyor-mathematician Johann Faulhaber (1580-1635) who had investigated them a hundred years earlier.

The roots of Faulhaber’s work were ancient. It had been known for millennia that 1 + 2 + 3 + … + *n* is equal to (1/2)*n*^{2} + (1/2)*n* for all *n*, and for nearly as long that 1^{2} + 2^{2} + 3^{2} + … + *n*^{2} is equal to (1/3)*n*^{3} + (1/2)*n*^{2} + (1/6)*n* for all *n* and that 1^{3} + 2^{3} + 3^{3} + … + *n*^{3} is equal to (1/4)*n*^{4} + (1/2)*n*^{3} + (1/4)* n^{2}* for all

Faulhaber found a formula in which certain mysterious multipliers played a role. These multipliers are what we now call the Bernoulli numbers. Faulhaber computed the first dozen or so of them and then stopped; he had demonstrated a way to find them, albeit an arduous one, and that was progress enough for him.

Faulhaber’s work was forgotten for a century. Then two mathematicians working completely independently rediscovered what Faulhaber had known: the Japanese mathematician Seki Takakazu (1642-1708) and the Swiss mathematician Jacob Bernoulli (1654-1705). Neither mathematician published his results while he was still alive; Takakazu’s result was published in 1712, while Bernoulli’s was published in 1713 as *Summae Potestatum*.

Bernoulli’s elation at his discovery is evident from his words, and his joy is laced with a bit of *schadenfreude*:

*I have found in less than a quarter of an hour that the tenth powers of the first thousand numbers beginning from 1 added together equal 91,409,924,241,424,243,424,241,924,242,500, from which it is apparent how useless should be judged the works of Ismael Bullialdus, recorded in the thick volume of his Arithmeticae Infinitorum, where all he accomplishes is to show that with immense labor he can sum the first six powers — part of what we have done in a single page.*

Bernoulli’s analysis laid more emphasis on the numbers he called A, B, C, D, etc. than Takakazu’s did, and Bernoulli was part of the European mainstream, so it’s natural that Leonhard Euler named these numbers Bernoulli numbers and not Takakazu numbers.

**STEAM DREAMS**

The English mathematician Charles Babbage had a dream.

It started in 1821, when young Charles and his friend John Herschel, charter members of the newly founded London Astronomical Society, were bemoaning the unreliability of published astronomical tables. Existing tables often had errors, whether caused by the people who computed the numbers (called “computers” in those days) or by the typesetters who recorded the numbers in print. Meanwhile, elsewhere in England, steam was doing amazing things to amplify the powers of the human body. As Babbage would later write:

*Mr. Herschel . . . brought with him the calculations of the computers, and we commenced the tedious process of verification. After a time many discrepancies occurred, and at one point these discordances were so numerous that I exclaimed, “I wish to God these calculations had been executed by steam,” to which Herschel replied, “It is quite possible.”*

Mechanical adding machines already existed; the philosopher-mathematician Blaise Pascal had himself invented one in 1642. Babbage realized by taking the components of such machines and hooking them together in new ways, he could create machines that could automatically tabulate the values of polynomial functions.

Let’s see how a small Difference Engine (for that is what Babbage called his invention) could have tabulated the values of the polynomial *n*^{2} + *n* + 1 that we met at a party earlier in this essay. Picture a machine with three numerical registers that evolve over time, with each number represented by the states of various gears as in a Pascal adding machine. At the beginning of the machine’s performance of its task the registers show the three numbers

(never mind why those specific numbers, at least for now). Then the device adds the 4 in the second register to the 3 in the first register and adds the 2 in the third register to the 4 in the second register, leaving the 2 in the third register alone:

(Here the straight and slanted vertical lines indicate how the new values of the registers are sums of the old values: 7 is 3+4, 6 is 4+2, and 2 is just 2.) Then the device adds the 6 in the second register to the 7 in the first register and adds the 2 in the third register to the 6 in the second register, once again leaving the 2 in the third register alone:

Then it executes the same procedure again:

And so on.

Do these numbers look familiar? They should! Babbage’s machine is constructing the difference table

one diagonal at a time. If we wanted a table of values of the polynomial *n*^{2} + *n* + 1, all we’d have to do is connect the Difference Engine to a suitable printer and have it print out the numbers that successively occur in the first register.

Earlier I wrote: “Polynomials have many important applications in the sciences, but the following party trick is not one of them.” That’s true enough. But we also saw (in the final part of the trick) that the paper-and-pencil technology of difference tables gave you a better way to compute values of your friend’s polynomial than your friend’s straightforward way. Also, we just saw how Babbage realized that you and your hand could be replaced by infallible (or at least less-fallible) gears, and that the more-accurate tables produced by such mechanical processes would have scientific (and industrial and military) uses. So yes, it’s a party trick. But it’s a party trick with a direct thematic link to Babbage’s Difference Engine. And as we’ll soon see, Babbage didn’t stop there.

**ADA LOVELACE**

Annabella Milbanke was not exactly George Gordon’s “type”, and he was certainly not hers. The latter circumstance was part of her appeal; her initial indifference intrigued and attracted him. (Or so at least I surmise; I wasn’t there.) But let’s give Annabella and George their titles: Annabella Milbanke Baroness Wentworth and George Gordon Lord Byron. Yes, that Byron. The memorable description of the iconic poet and scoundrel as “mad, bad, and dangerous to know” was coined not by his enemies but by his more-than-friend Lady Caroline Lamb. Milbanke would be known in her later years as a champion of progressive causes, such as vaccination. Byron admired the intelligent young Annabella and in a letter to her aunt dubbed the bookish heiress a “Princess of Parallelograms”. Annabella rejected his first proposal of marriage but unwisely accepted the second. When a libertine and a moralist fall in love in a romantic comedy, each has a moderating influence on the other, but in real life, clashing natural proclivities can lead to ever-greater polarization, and such I think was the case with Annabella and George. After only a few years their marriage ended (“I heard she moved out! She gave up on the marriage!” “Well, *I* heard she only moved *out* after he brought one of his lovers *in*!”), but before that they had a baby together, a child who was mere months old when her parents separated.

Young Ada showed an even greater aptitude for mathematics than her mother had and Annabella encouraged the girl, hoping that the pursuit of mathematics, and more broadly the cultivation of her faculties of reason, would protect her from the genetic taint of her father’s unbalanced temperament (or, some might say, insanity). The end result of Annabella’s efforts was a daughter who pursued what Ada called “poetical science”, eventually earning her (from an admiring Babbage) a nickname that surpassed the one Byron had given her mother: “Enchantress of Number”.

Ada met Babbage in 1833 at a party at Babbage’s house. The teenager must have made a good impression on the quadragenarian, because he invited her and her mother to attend a demonstration of his newly constructed Difference Engine (a prototype of a larger version he hoped to build): a two-foot-tall machine with 2000 brass parts that was powered by a hand crank and could have printed out mathematical tables if only Babbage had completed the envisioned printer that went with it.

Babbage’s plans for his Difference Engine ran aground on the shoals of several problems. One was that machining parts to the required precision was a more arduous and expensive proposition than he had realized when he first proposed his scheme to the British government as a way to automate the production of mathematical tables. Someone with more business sense or political savvy might have been able to handle these delays and overruns, but the unworldly Babbage wasn’t up to the task. Besides, he was distracted by a vision of an even greater machine that could solve more interesting problems than computing values of polynomials. He called the envisioned device his Analytical Engine.

In 1840, Babbage, having failed to interest the British government in supporting his work, visited Italy to give a talk about the Analytical Engine and thereby drum up enthusiasm and funding overseas. His talk inspired the Italian mathematician Luigi Menabrea to publish a description of the engine in French. Ada (now married to William King-Noel, First Earl of Lovelace) took on the project of translating Menabrea’s article into English, and decided to enhance the article with Notes of her own (notes so comprehensive that they eventually dwarfed what Menabrea had written). In her notes, Lovelace limns a future in which mechanical devices will be able to do such things as compose music, if only we tell them quite precisely the mathematical rules governing musical composition. But “Sketch of the Analytical Engine Invented by Charles Babbage, with Notes by the Translator” isn’t just a prophecy of a coming age of machine intelligence; it also contains what some have called the first true computer program. And that program computes Bernoulli numbers.

Computing Bernoulli numbers is a good deal more complicated than computing the values of polynomials; see Target’s article if you want to know more. One reason Lovelace chose this computing challenge was that it showcased a feature of the Analytical Engine that the Difference Engine had lacked: the ability of a computation to enter a loop, or as Babbage put it, to “eat its own tail”. Nowadays we recognize the momentous significance of the difference between the two designs: in principle, a machine like the Analytical Engine had crossed over into the domain of universal computation.

Whether or not one chooses to regard Lovelace’s program as the first computer program ever written, it was certainly the most complicated set of instructions for a mechanical computation that had ever been described up till then. Computer science bloggers Jim Randall and Sinclair Target and Stephen Wolfram noticed that at one point in her program there’s a mistake: the numerator and denominator of a fraction have been swapped. The first programmer was also the creator of the first programming bug! But I don’t think the bug does her any discredit; it points to the magnitude of her ambition and the complexity of the task she had chosen to undertake. As Target asks, if you aren’t writing bugs, are you writing real programs?

Alas, Babbage’s dreams were bigger than his budget and his managerial capabilities. The Engine was never built. Ada died of cancer in 1852 at the age of 37, and Charles died in 1871, a bitter and disappointed old man. The Victorian Computer Age never dawned (though authors of steampunk fiction keep wondering “What if …?”).

*If* the Analytical Engine had been built in her lifetime, I have no doubt that Lovelace would have found the bug in her program. And while we’re talking about mistakes, did any of you notice that the page from Bernoulli’s “Summae Potestatum” that I showed you earlier contains an error? That last term in the second-to-last polynomial in his table should be −3/20 *nn*, not −1/12 *nn*. Bernoulli was a creative human being, not a mathematical engine. (Though I have no doubt that *if* you’d told Bernoulli he’d made a mistake in his table, he would’ve found it in a lot less than fifteen minutes.)

The history of math shows time and time again that the giants of mathematics aren’t flawless paragons of reason who never err; they’re humans who discover new vistas, explore them, have creative ideas (some of which work), inevitably stumble as they traverse landscapes never seen before, recover from their stumbles, and move on. Indeed, the creative faculty of human beings — not to be found in the Analytical Engine, which Lovelace famously wrote can only do “whatever we know how to order it to perform” — may share roots with the human propensity for error. The Latin root for “error” is the word for wandering. If we don’t wander off beaten paths, how will we know what vistas we’re missing?

*Thanks to Sandi Gubin, Eliana Propp-Gubin, and Stephen Wolfram.*

**REFERENCES**

Sarah Baldwin, Ada Lovelace and the Analytical Engine.

Janet Beery, Sums of Powers of Positive Integers – Jakob Bernoulli (1654-1705), Switzerland.

Peter Cameron, Polynomials taking integer values.

A. W. F. Edwards, “Sums of powers of integers: a little of the history”, The Mathematical Gazette, Vol. 66, No. 435 (Mar., 1982), pp. 22-28.

Martin Gardner, “The Calculus of Finite Differences”, chapter 20 in “New Mathematical Diversions”.

Silvio Maracchia, The importance of symbolism in the development of algebra, *Lettera Matematica* volume 1, pages 137-144 (2013).

Burkard Polster, “Power sum MASTER CLASS: How to sum quadrillions of powers … by hand! (Euler-Maclaurin formula)”: Mathologer channel, https://youtu.be/fw1kRz83Fj0 .

Duana Saskia, Discovering Ada’s Bernoulli Numbers, Part 1. (Alas, there seems to be no Part 2!)

Sinclair Target, What Did Ada Lovelace’s Program Actually Do?

Stephen Wolfram, Untangling the Tale of Ada Lovelace.

**ENDNOTES**

#1. Algebraists were sometimes called “cossists”, which I suppose could be translated as “thingologists”.

#2. The wordy kind of number-talk that people used in the old days really is called “rhetorical algebra”. I’m not kidding. Maybe high school algebra teachers today should spice things up in the classroom by using a broader range of classic rhetorical devices: *accismus* (“Whatever you do, *don’t* subtract *x* from both sides!”), *adynata* (“You’ll find a solution to *x* = *x*+1 when cows do calculus”), *antimeria* (“Let’s see if our old friend Mister Quadratic Formula can help us out here”), etc.

#3. The word “binomial” occurs in a famed assemblage of late-19th-century English cultural trivia called “The Major General’s Song”, from Gilbert and Sullivan’s operetta *The Pirates of Penzance*. In the song, career soldier Major General Stanley, while conceding his complete lack of military knowledge, boasts:

*I’m very well acquainted, too, with matters mathematical;/ I understand equations, both the simple and quadratical./ About binomial theorem I’m teeming with a lot o’ news / With many cheerful facts about the square of the hypotenuse.*

Likewise, in Conan Doyle’s story “The Final Problem”, we are meant to infer that Sherlock Holmes’ nemesis Moriarty is Holmes’ intellectual equal when Holmes tells Watson:

*“He [Moriarty] is a man of good birth and excellent education, endowed by nature with a phenomenal mathematical faculty. At the age of twenty-one he wrote a treatise upon the Binomial Theorem, which has had a European vogue. On the strength of it he won the Mathematical Chair at one of our smaller universities, and had, to all appearances, a most brilliant career before him.”*

Of course the binomial theorem of Newton was old news by the time Holmes came on the scene. But I like to imagine that Holmes was thinking of the *q*-binomial theorem, which (as part of the broader subject of *q*-series) was a hot topic on the Continent in the 19th century.

#4. Write the polynomial *a* *x*^{2} + *b* *x* + *c* as *p*(*x*) for short, and define *q*(*x*) as *p*(*x*+1) − *p*(*x*), so that the numbers in the second row of the table are *q*(*n*), *q*(*n*+1), . . . Expanding the expression *p*(*x*+1) − *p*(*x*) and regrouping, we get

*q*(*x*) = (*a* (*x*+1)^{2} + *b* (*x*+1) + *c*) − (*a* (*x*)^{2} + *b* (*x*) + *c*) = *a* ((*x*+1)^{2}−*x*^{2}) + *b* ((*x*+1)−*x*) + (*c*−*c*) = *a*·(2*x*+1) + *b*·(1) = (2*a*)*x* + (*a*+*b*) ,

a linear function of *x*. So the numbers in the second row form an arithmetic progression.

#5. If your friend gives you *d*+1 successive values of some polynomial of degree *d*, you can find successive terms using a trapezoidal array with *d*+1 rows instead of just three. That’s because if *p*(*x*) is a polynomial of degree *d*, the polynomial *p*(*x*+1) − *p*(*x*) simplifies to give a polynomial of degree *d*−1. So if the top row of your table gives *d*+1 successive values of a polynomial of degree *d*, the second row will give *d* successive values of a polynomial of degree *d*−1; likewise the next row will give *d*−1 successive values of a polynomial of degree *d*−2; and so on, with the *d*th row giving 2 successive values of a polynomial of degree 1 and the final (*d*+1st) row giving the value of a polynomial of degree 0. And a polynomial of degree 0 is just a constant function of *x*; once you know its value at *x* = *n*, you know its value at *x* = *n*+1, *n*+2, etc.! So you can fill in the last row, which lets you fill in the second-to-last row, and so on, all the way back up to the top. So you can fill in that top row and announce those numbers while your friend is still squaring and swearing.

#6. Since the top row consists of integers, the whole triangle beneath it consists entirely of integers as well (since the difference of two integers is always an integer). But the bottom row represents a constant polynomial, so the bottom row extends to give infinitely many repetitions of that integer. Now fill in the difference table going upward; the sum of two integers is always an integer, so you’ll never see any non-integer values anywhere in the extended table, including in its top row. So this tells us that *p*(*d*+2), *p*(*d*+3), etc. are all integers as well. Technically this argument only handles *p*(*n*) as *n* goes to positive infinity, not as *n* goes to negative infinity, but by extending the difference table to the left as well as the right we can take care of this case too. I learned of this pretty application of difference tables from Peter Cameron’s blog (see the References).

#7. If you take the sequence whose *n*th term is 1^{k} + 2^{k} + 3^{k} + … + *n*^{k} and form its difference sequence, you’ll just get the sequence of *k*th powers whose terms are of course given by a polynomial of degree *k*. You might say that the sequence 1^{k}, 1^{k}+2^{k}, 1^{k}+2^{k}+3^{k}, … is the “anti-difference” of the sequence 1^{k}, 2^{k}, 3^{k}, … Since taking the difference sequence of a polynomial decreases its degree by 1, it makes sense that taking the anti-difference increases the degree by 1. And this fact from the calculus of finite differences foreshadows something similar that happens in the infinitesimal calculus, where differentiating a polynomial reduces its degree by 1 and anti-differentiating it increases its degree by 1. This is just one of many profound similarities between the calculus of finite differences and the differential calculus.

Mathematicians celebrate the French thinker René Descartes for inventing Cartesian coordinates.^{1} But we should also remember him as the person who tilted the terrain of Europe’s mathematical alphabet, using early letters of the alphabet to signify known quantities and imbuing later letters (especially *x*) with the pungent whiff of the Unknown. If you learned to write quadratic expressions as *ax*^{2} + *bx* + *c* instead of *xa*^{2} + *ya* + *z* (and I’m guessing you did), it’s down to Descartes.^{2}

My topic this month is polynomials like *ax*^{2} + *bx* + *c*. In school math, you first learned about *x* as an unknown, a number hiding behind a mask. (“What is *x*? Let’s find out.”) Later you learned to view *x* as a variable, so that a formula like *y* = *ax*^{2} + *bx* + *c* is a function or rule: if you give me an *x*, I’ll give you a *y*. (“What is *x*? No number in particular; *x* ranges over all real numbers.”) I’ll touch on both points of view today, but I’ll be stressing a viewpoint that’s probably less familiar, where *x* is neither an unknown nor a variable, but just, well, itself. From this perspective, polynomials appear as number-like objects in and of themselves, with their own habits and mating behavior.

Let’s start with something a bit silly. The Global Math Project website that has a polynomial with your name on it. Literally. Go to http://globalmathproject.org/personal-polynomial/ and type in your name, and the website will give you a mathematical expression that (unless you go by “JJ” or some other moniker whose letters are all the same) will contain one or more occurrences of the variable *x*; say hello to your Personal Polynomial. What makes it your Personal Polynomial is that if you replace *x* by the number 1, the expression turns into the numerical value of the 1st letter of your name (where A = 1, B = 2, etc.); if you replace *x* by the number 2, the expression turns into the numerical value of the 2nd letter of your name; and so on. For instance, when I typed in “JIM” the Personal Polynomial genie gave me the degree-two polynomial (5/2)*x*^{2} − (17/2)*x* + 16; with *x* = 1, my Personal Polynomial turns into 5/2 − 17/2 + 16 = 10, and sure enough, the 1st letter of my name is the 10th letter of the alphabet, J. Likewise, plugging in *x* = 2 gives 9 (the numerical value of I) and plugging in *x* = 3 gives 13 (the numerical value of M). That is, the computer found magic numbers *a*, *b*, and *c* with the property that the polynomial *p*(*x*) = *ax*^{2} + *bx* + *c* satisfies the three equations *p*(1) = 10, *p*(2) = 9, and *p*(3) = 13.

Try it! If you give the program a name with *n* letters, it’ll reply with a polynomial of degree *n*−1 or less that does the job.^{3}

When you use the Personal Polynomial website it’s hard to see anything sinister lurking nearby. But algebraic expressions have a seductive slickness that can lead us to invest more faith in them than we should. Although you’d be unlikely to be so extremely silly as to plug *x* = 4 into my Personal Polynomial and then announce that the fourth letter of “JIM” should be a V (or that the letter after that should be the thirty-sixth letter of the alphabet), certain American policy-makers did something similar during the early days of the Covid epidemic: they fitted a degree-three polynomial to epidemiological data from the past and used it to predict the future.^{4} You can read more about this in Jordan Ellenberg’s book *Shape*, though for a tongue-in-cheek demonstration of the pitfalls of extrapolation it’s hard to beat what Mark Twain wrote over a century ago:

In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitiic Silurian Period, just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.

Mark Twain, “Life on the Mississippi”

**POLYNOMIALS SET FREE**

Students in algebra classrooms see polynomials in action in all kinds of ways. I already demonstrated *substitution* when I plugged *x* = 1 into (5/2)*x*^{2} − (17/2)*x* + 16 and got 10. There’s also the reverse problem of *solving* for *x*: if all I tell you about *x* is that (5/2)*x*^{2} − (17/2)*x* + 16 equals 10, what values might *x* have? 1 is one such value, but are there others?^{5} Here *x* plays the role of the unknown, the not-yet-known, the about-to-be-known, etc. And when you see a formula like *y* = *ax*^{2} + *bx* + *c*, chances are you’ll be asked to *graph* it.

But what I want to describe to you this month are generating functions, in which *x* is no longer a stand-in for an unknown number; *x* stands proudly on its own. And it brings along some friends.

In a way, generating functions are reminiscent of the numbers Euler gave us when he admitted a new player *i* into the number game and saw what new numbers it gave rise to, using no information about *i* except the equation *i*^{2} = −1. The big difference is that, whereas Euler decreed that *i*^{2} should equal −1, we won’t decree anything about *x*^{2} at all, or *x*^{3} or any higher powers (though we will decree that *x*^{0} equals 1). We remain agnostic about what *x* means and just manipulate expressions according to rules, like the rule that says that *x*^{a} times *x*^{b} equals *x*^{a+b} (“Exponents add when you multiply powers”). *x* has been freed from its obligation to refer to anything outside of itself.

The noted mathematician Herb Wilf^{6} once wrote that “A generating function is a clothesline on which we hang up a sequence of numbers for display.” Wilf wrote the book on generating functions, literally, but I would like to contest, or at least elaborate upon, the word “display”. Clotheslines aren’t just for display; they’re also useful. And so are generating functions. I’ll give three applications, all of which have to do with dice.

**ADDING DICE ROLLS AND MULTIPLYING POLYNOMIALS**

Thousands of years ago, our ancestors felt threatened by mighty and seemingly lawless forces in the air above and the earth beneath and attributed them to mighty powers who might be propitiated by sacrifices or whose actions might at least be predicted through suitable forms of divination. (This was before we had degree-three polynomials.) The unpredictability of processes like the casting of lots seemed to have an affinity to the capriciousness of Nature and so were thought to provide a kind of channel to the supernatural powers that controlled both.

The ankle-bone (or “talus”) of a hoofed animal would have seemed like a natural choice of divinatory device: these four-sided objects were plentiful, their origin linked them to life and death, and if you tossed one, it was hard to know which side would face upward when it stopped rolling. It’s theorized that over time such oracular bones evolved into cubical dice. In any case we know that even if dice were invented for divinatory purposes, they became coopted for games of chance quite early; some of the oldest dice archeologists have found are dice that have been tampered with in such a way as to bias the outcome (so-called “crooked” or “gaffed” dice). This might indicate an attempt to affect the hazards of weather or warfare but seems more likely to indicate an attempt to acquire wealth at the expense of others. (Even today, we acknowledge the dare-I-say dicey nature of personal finance by referring to a large sum as a “fortune”.) Dice games were commonplace by the time European mathematicians like Pascal and Fermat laid the groundwork for the modern theory of probability in the 1600s.

Consider a die of the modern kind, with six faces showing the numbers 1 through 6 in the form of dots, or “pips” (where a face with 1 pip signifies the number 1, a face with 2 pips signifies the number 2, and so on). We’ll give this die its own Personal Polynomial (yeah, dice aren’t persons, but you know what I mean): the polynomial *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}. That is, we take *x* to the power of each of the numbers shown on the die’s six faces, and we add those powers. (In precollege math it’s usual to write polynomials with the highest-degree term first and the lowest-degree term last, but for generating functions it’s more useful to do the reverse, so that the exponents count up instead of down.)

Why is it useful to associate polynomials with dice in this way? Because (as we’ll see) when we multiply the polynomials associated with two dice, we get useful information about what happens when we roll both dice and add the numbers that they show. Similarly, when we square the polynomial associated with one die, we get useful information about what happens when we roll the die twice and add the two numbers we see.

Before we dive into multiplying a degree-six polynomial by itself, let’s take the simpler example of a two-sided die (better known as a “coin”). Suppose the coin has 1 pip on one side and 2 pips on the other side. What can happen when you toss it twice and record the sum? You might get a 1 followed by a 1, or a 1 followed by a 2, or a 2 followed by a 1, or a 2 followed by a 2. So the total can be 2, 3, 3, or 4. We can make a two-by-two table of the four possibilities, which the alert reader (or even a woozy one) may recognize as a very small addition table that shows the sums you can get when you add 1-or-2 plus 1-or-2.

Since exponents add when you multiply powers, we find a similar pattern when we form a very small multiplication table that shows the products you can get when you multiply *x*^{1}-or-*x*^{2} times *x*^{1}-or-*x*^{2}.

It’s no coincidence that these are exactly the terms you get when you multiply *x*^{1}+*x*^{2} by itself and expand using the distributive law (or using “FOIL”, if you insist).

Likewise, imagine a three-sided die with 1 pip, 2 pips, and 3 pips on its three respective sides. We can make a table of all nine possibilities.

Reading the table, we find 1 way to roll a 2, 2 ways to roll a 3, 3 ways to roll a 4, 2 ways to roll a 5, and 1 way to roll a 6. Alternatively, we can multiply *x*^{1} + *x*^{2} + *x*^{3} by itself and collect like terms, obtaining the generating function (*x*^{1} + *x*^{2} + *x*^{3})^{2} = *x*^{2} + 2*x*^{3} + 3*x*^{4} + 2*x*^{5} + *x*^{6}.

The deluxe version of the distributive law says that if you have one sum of numbers and you multiply it by another sum of numbers, you have to individually multiply each number in the first sum by each number in the second sum and then add up all the resulting products, being careful not to leave any out or include any twice. Meanwhile, a multiplication table is designed to force us to record each possible product once and only once by giving us a well-defined place to record the answer. So the nine terms in the expansion correspond to the nine entries in the small multiplication table.

Now, at this point you should be skeptical of the usefulness of generating functions. If all we care about is the respective numbers of ways to roll a 2, 3, 4, 5, or 6 using two rolls of our three-sided die, then the extra baggage of those plus-signs and powers of *x* seems like mere typographical distraction from the real message. But suppose we want to know whether two rolls of a six-sided die (the kind of die people actually use in games) is likelier to give us a sum that’s odd or even. We can use generating functions to solve this problem just by *thinking* about the generating function

*p*(*x*) = (*x*^{1} + *x*^{2} + *x*^{3} +* x*^{4} + *x*^{5} + *x*^{6})^{2} = *x*^{2} + 2*x*^{3} + … + 2*x*^{11} + *x*^{12}

without actually expanding it out and writing down the intermediate terms.^{7}

**NEGATIVE ONE LENDS A HAND**

The trick involves looking at *p*(−1). (This essay is called “Let x equal x”^{8}, but that doesn’t mean I won’t sometimes want to let x equal some specific number like −1.) Actually, let’s first look at what we get when we replace *x* by −*x* in *p*(*x*). Remember that the polynomial *p*(*x*) is *x*^{2} + 2*x*^{3} + … + 2*x*^{11} + *x*^{12}. So *p*(−*x*) is (−*x*) + 2(−*x*)^{3} + · · · + 2(−*x*)^{11} + (−*x*)^{12} , which equals *x*^{2} − 2*x*^{3} + … − 2*x*^{11} + *x*^{12}. The coefficients don’t change in magnitude, but every second plus-sign gets flipped to a minus-sign. Notice that the outcomes in which the sum of the two numbers that are rolled is even correspond to terms with even exponent, which get left alone; but the outcomes in which the sum of the two numbers that are rolled is odd correspond to terms with odd exponent, whose sign gets flipped. So now, if you replace *x* by 1 in *p*(−*x*) (which is the same as replacing *x* by −1 in *p*(*x*)), you get the sum 1 − 2 + ··· − 2 + 1 in which the positive terms correspond to the ways to roll an even sum and the negative terms correspond to the ways to roll an odd sum.

That means in the contest between the forces of Even and the forces of Odd, we can assess the balance of power by seeing whether *p*(−1) is positive (in which case the outcomes with even sum outnumber the outcomes with odd sum) or negative (in which case the outcomes with odd sum outnumber the outcomes with even sum).

And here’s where the deus-ex-algebra descends upon the scene. Remember, *p*(*x*) equals (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{2} , so *p*(−1) equals (−1+1−1+1−1+1)^{2}. But that’s just 0^{2}, or 0. So it’s a stand-off between Even and Odd. That is, when you roll two dice, the sum has an equal chance of being even or odd.^{9}

Now you still may be thinking you’d rather not think so hard; you (or an algebraic calculator) could just expand (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{2} as

1*x*^{2} + 2*x*^{3} + 3*x*^{4} + 4*x*^{5} + 5*x*^{6} + 6*x*^{7} + 5*x*^{8} + 4*x*^{9} + 3*x*^{10} + 2*x*^{11} + 1*x*^{12}

and then just check that 1 + 3 + 5 + 5 + 3 + 1 (the sum of the coefficients of the even powers of *x*) equals 2 + 4 + 6 + 4 + 2 (the sum of the coefficients of the odd powers of *x*). But the real strength of the more abstract approach lies in how well it scales. Because: The problem I *really* wanted to ask you involves rolling a six-sided die *six* times. (Rolling it just twice was only a warm-up.)

So, suppose you roll a six-sided die six times. The generating function for the sum of the six numbers that you roll is (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{6}, a polynomial with 31 terms that I would never dream of writing out by hand (or if I did dream about it I’d call it a nightmare). But if you just want to know whether the sum of the six numbers is likelier to be even or odd, you can just plug in *x* = −1, obtaining (−1+1−1+1−1+1)^{6}, or 0. So once again, an even sum and an odd sum are equally likely.

Challenge problem: Suppose we have a (fair) five-sided die. (It’s easy to use a fair six-sided die to simulate a fair five-sided side: just roll it in the ordinary way, and if the outcome is a 6, look around furtively, announce “That didn’t just happen” and roll it again, and keep at it until you get a 1, 2, 3, 4, or 5.) If you roll your five-sided die five times, do you think the sum is likelier to be odd or even? What if you roll it six times? Generating functions give a quick route to the answer.^{10}

Things get even more fun when more variables come into the game.^{11}

**SICHERMAN DICE**

We’ve played with taking powers of the generating function *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}; now let’s go the other way and factor it.

Just as six jellybeans in a row can be divided into three groups of two or two groups of three, the terms in the sum

*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}

can be grouped as

(*x*^{1} + *x*^{2}) + (*x*^{3} + *x*^{4}) + (*x*^{5} + *x*^{6})

or as

(*x*^{1} + *x*^{2} + *x*^{3}) + (*x*^{4} + *x*^{5} + *x*^{6}).

The former grouping can be written as *x*^{0} (*x*^{1} + *x*^{2}) + *x*^{2} (*x*^{1} + *x*^{2}) + *x*^{4} (*x*^{1} + *x*^{2}), which equals (*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{1} + *x*^{2}), while the latter can analogously be written as *x*^{0} (*x*^{1} + *x*^{2} +* x*^{3}) + *x*^{3} (*x*^{1} + *x*^{2} + *x*^{3}) which equals (*x*^{0} + *x*^{3}) (*x*^{1} + *x*^{2} + *x*^{3}).

These factorizations give us curious ways to simulate a 6-sided die using a 2-sided die and a 3-sided die. (Multiplication of generating functions is the “mating behavior” I mentioned at the beginning of the essay.)

Let’s go back to the equation

(*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{1} + *x*^{2}) = *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}.

If we have a 3-sided die with faces showing 0 pips, 2 pips, and 4 pips (coming from the first factor on the left-hand of the equation) and a 2-sided die with faces showing 1 pip and 2 pips (coming from the second factor on the left-hand of the equation), and we roll them both and record the sum, the six equally likely outcomes are precisely the numbers 1 through 6.

Similarly, consider the equation

(*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{3}) = *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}.

It tells us that if we roll a 3-sided die with faces showing 1 pips, 2 pips, and 3 pips and a 2-sided die with faces showing 0 pips and 3 pips, and we roll them both and record the sum, the six equally likely outcomes are again the numbers 1 through 6.

This leads us to reinvent what are called Sicherman dice, devised in 1977 by puzzlemaker George Sicherman^{12} (though he did not invent them using generating functions). Remember that the generating function for the sum of two ordinary six-sided dice is (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{2} . We can use our two factorizations to write this as

(*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{1} + *x*^{2}) × (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{3}).

But, swapping factors, we see that this is equal to

(*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{0} + *x*^{3}) × (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{1} + *x*^{2})

or (moving a factor of *x* from the *x*-endowed factor *x*^{1} + *x*^{2} to the *x*-impoverished factor *x*^{0} + *x*^{2} + *x*^{4})

(*x*^{1} + *x*^{3} + *x*^{5}) (*x*^{0} + *x*^{3}) × (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{1}).

The first product (*x*^{1} + *x*^{3} + *x*^{5}) (*x*^{0} + *x*^{3}) expands as

*x*^{1} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6} + *x*^{8}

while the second product (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{1}) expands as

** x^{1} + x^{2} + x^{2} + x^{3} + x^{3} + x^{4}**.

These two polynomials are the generating functions for Sicherman’s dice: the first has six sides bearing the numbers 1, 3, 4, 5, 6, and 8, while the second has six sides bearing the numbers 1, 2, 2, 3, 3, and 4. Despite the fact that the dice look strange, if you roll each once, the sum of the two numbers you roll is statistically indistinguishable from what you’d get from rolling two ordinary dice. And that’s because when you mate the generating functions of Sicherman’s dice, you’re getting the same polynomial factors that you get when you mate the generating functions of two ordinary dice — you just get them in a different order.^{13}

Sicherman dice were introduced to the world at large through the Mathematical Games column of Martin Gardner, the writer who was hailed in his time as “the best friend mathematics ever had”. I grew up reading many of those columns in the 1970s, and if it hadn’t been for Gardner I would be a very different sort of mathematician, if I were a mathematician at all.

On a YouTube channel created in Gardner’s honor called “Celebration of Mind”, Alexandre Muñiz has a nice video about Sicherman dice and other things; in the video he shows how you can take two 2-sided dice and two 3-sided dice and (pairing them up one way) simulate two ordinary dice and (pairing them up another way) simulate the Sicherman dice.

Muñiz and I have come up with a curious kind of die that I hope someone will fabricate and send me. It’s a clear cube made of resin or acrylic with an opaque tetrahedral die embedded in it, with the four corners of the tetrahedron corresponding to four of the eight corners of the cube. (Or maybe the surrounding cube exists only as a network of twelve struts; we still haven’t decided what physical instantiation works best.) In any case, the tetrahedron inside the cube has four faces with 0, 1, 2, and 4 pips respectively. You can check that the number of pips visible from above the die after you roll it is 1, 2, 3, 4, 5, or 6, just as with an ordinary die, but the total number of pips is only 7 instead of 28.

The tetrahedron was my idea; I thought one would roll it on a creased surface (such as the inside of an open book) so that it always lands on an edge, as described in a recent Riddler Express puzzle at FiveThirtyEight.com. Michael Branicky suggested using a taco holder while Zach Wissner-Gross preferred a ridged gnocchi board. Muñiz had the fantastic idea of embedding the tetrahedron in a cube. I can see it in my head, but I’d like to hold one in my hand!

[Postscript: Reader Dave LeCompte has fabricated such a die! A photo appears below.

The photo includes a penny for scale.]

If you find the math behind Sicherman dice fun, you might ask the question, for what values of *n* can you design two non-standard *n*-sided dice with the property that, if you roll both dice and record their sum, the outcome is statistically indistinguishable from what you’d get from rolling two standard *n*-sided dice? (Here the standard *n*-sided die has numbers 1 through *n* on its faces.) We’ve seen that you can do it for *n* = 6; what other values are possible? Post your ideas in the Comments.

Incidentally, the Romans used two kind of dice: a small six-sided die called a tessera whose sides were marked with the numbers from 1 to 6 (that is, essentially a modern cubical die) and a larger four-sided die called a talus whose sides were marked with the numbers 1, 3, 4, and 6. Do you see a way to “Sicherman-ize” the talus die? That is, do you see how to design two four-sided dice, different from the talus and from each other, with the property that if you roll them together, the distribution of the sum is the same as what you would see if you rolled two talus dice? Post your answer in the Comments.

Not related to dice but definitely related to generating functions is the 2Blue1Brown video “Olympiad-level counting” I recommended last month. If you haven’t watched it yet, now’s the time!

**CROOKED DICE**

I’ll close with a famous puzzle about cheating at dice; it can be solved with generating functions, though it is a bit more advanced than the other puzzles. The question is, can you design two crooked 6-sided dice with the property that their sum is equally likely to be any of the eleven numbers 2, 3, 4, …, 10, 11, and 12?

By a crooked die, I mean a die in which the six outcomes don’t have an equal chance of occurring. In practice, a really lopsided weighting would be (a) hard to achieve and (b) easy to detect, but since you’re reading this as a math-fan and not as a gambler, we’ll model a crooked die as being determined by any six nonnegative numbers *a*_{1}, *a*_{2}, … , *a*_{6} that add up to 1, and imagine that those are supposed to be the probabilities of the die landing with the six respective faces facing up.

I want two crooked dice, one associated with the probabilities *a*_{1}, *a*_{2}, … , *a*_{6} and the other associated with the probabilities *b*_{1}, *b*_{2}, … , *b*_{6}, so that when I roll both dice, the sum of the numbers shown is equally likely to take on each of the eleven possible values between 2 and 12.

To get you started on the problem, I’ll claim (without proof) that the property we’re trying to achieve can be restated in terms of the generating functions

*A*(*x*) =* a*_{1} *x*^{1} + *a*_{2} *x*^{2} + *a*_{3} *x ^{3}* +

and

*B*(*x*) =* b*_{1} *x*^{1} + *b*_{2} *x*^{2} + *b*_{3} *x ^{3}* +

Specifically, we want to choose numbers *a*_{1}, *a*_{2}, … , *a*_{6} and *b*_{1}, *b*_{2}, … , *b*_{6} so that *A*(*x*) times *B*(*x*) equals (1/11) *x*^{2} + (1/11) *x*^{3} + … + (1/11) *x*^{11} + (1/11) *x*^{12}. Can you find a way to do it or show that it can’t be done? Are there two generating functions that we can mate to create such an offspring?^{14}

*Thanks to Sandi Gubin, Alexandre Muñiz, Bill Ossmann, George Sicherman, James Tanton, and Dan Ullman.*

**REFERENCES**

Gary Antonick, “Col. George Sicherman’s Dice”, https://archive.nytimes.com/wordplay.blogs.nytimes.com/2014/06/16/dice-3/

Martin Gardner, “Sicherman Dice, the Kruskal Count and Other Curiosities”, chapter 19 in Penrose Tiles to Trapdoor Ciphers … and the Return of Dr. Matrix.

Alexandre Muñiz, “How to Roll Two Dice”, https://www.youtube.com/watch?v=-aDfFh5YUD8

George Sicherman, “Sicherman Dice”, https://userpages.monmouth.com/∼colonel/sdice.html

**ENDNOTES**

[Note: Some of you may have tried to access specific endnotes by clicking on the associated footnote numbers in the main body of the tex, and then been frustrated that this didn’t work. I’m frustrated too! I used to have a way to do this but it doesn’t work in the current version of WordPress. If any of you know a good way to create navigable internal links using the current WordPress implementation of hypertext, please let me know. It’s the year 20-friggin’-22; you shouldn’t have to scroll forward and then scroll back to read the endnotes!]

#1. Others like Pierre Fermat played a role in inventing what are now called Cartesian coordinates, but I don’t want to go down that interesting side-track today.

#2. Technically, Descartes would have written *ax*^{2} as *axx*, reserving exponential notation for powers higher than the 2nd.

#3. How does the mathematical engine lurking behind the website do its magic? You can find out from Tanton’s videos, available through links underneath the Personal Polynomial screen. I don’t know the details of this particular computer program, but I know one way the trick can be done, via the time-honored tactic of breaking a problem into pieces. If we can find three polynomials – call them *p*_{1}(*x*), *p*_{2}(*x*), and *p*_{3}(*x*) – satisfying

*p*_{1}(1) = 10, *p*_{1}(2) = 0, and *p*_{1}(3) = 0,

*p*_{2}(1) = 0, *p*_{2}(2) = 9, and *p*_{2}(3) = 0, and

*p*_{3}(1) = 0, *p*_{3}(2) = 0, and *p*_{3}(3) = 13,

then we can add them together to get a polynomial satisfying

*p*(1) = 10, *p*(2) = 9, and *p*(3) = 13.

If we define *q*_{1}(*x*) = (*x* − 2) (*x* − 3), then the polynomial *q*_{1}(*x*) almost does the job that *p*_{1}(*x*) is supposed to do; it satisfies two of the three conditions, specifically, *q*_{1}(2) = 0 and *q*_{1}(3) = 0. Before you check this, let me warn you that if you check it by expanding *q*_{1}(*x*) as *x*^{2} − 5*x* + 6 and then substituting *x* = 2 and *x* = 3, Descartes will turn over in his grave. The right way to see that *q*_{1}(2) is 0 is to plug *x* = 2 directly into the product (*x* − 2) (*x* − 3); then the first factor is 2 − 2, or 0, so the product must be 0, regardless of what the second factor is. Likewise, the right way to see that *q*_{1}(3) = 0 is to plug *x* = 3 directly into the product (*x* − 2) (*x* − 3). Unfortunately, *q*_{1}(1) isn’t 10; it’s (1 − 2) (1 − 3) = 2. But all you have to do to fix that blemish is multiply *q*_{1} by 5. If we put *p*_{1}(*x*) = 5 *q*_{1}(*x*) = 5 (*x* − 2)(*x* − 3), we get *p*_{1}(1) = 10, *p*_{1}(2) = 0, and *p*_{1}(3) = 0, just as we wanted. So we’ve found *p*_{1}(*x*).

Likewise, we can get *p*_{2}(*x*) by starting with *q*_{2}(*x*) = (*x* − 1) (*x* − 3) and multiplying by a suitable fudge factor (namely −9) to get *p*_{2}(*x*) = −9 (*x* − 1) (*x* − 3) satisfying *p*_{2}(1) = 0, *p*_{2}(2) = 9, and *p*_{2}(3) = 0. Lastly, *p*_{3}(*x*) = (13/2) (*x* − 1) (*x* − 2) satisfies *p*_{3}(1) = 0, *p*_{3}(2) = 0, and *p*_{3}(3) = 13. Putting it all together, we form

*p*(*x*) = *p*_{1}(*x*) + *p*_{2}(*x*) + *p*_{3}(*x*) = 5 (*x* − 2) (*x* − 3) − 9 (*x* − 1) (*x* − 3) + (13/2) (*x* − 1) (*x* − 2),

which (after you expand and recombine the terms) becomes (5/2)*x*^{2} − (17/2)*x* + 16. This is the famous method of Lagrange interpolation.

#4. What the forecasters did was not quite as dumb as fitting a third-degree polynomial to just four data points. Rather, they took a whole lot of data points and found the third-degree polynomial that fits the data as closely as possible. That’s less dumb, but when you’re crafting national policy in the face of a global health emergency, “less dumb” doesn’t cut it.

#5. You could solve the equation (5/2)*x*^{2} − (17/2)*x* + 16 = 10 by rewriting it as (5/2)*x*^{2} − (17/2)*x* + 6 = 0 and using the quadratic formula, but I prefer to use factoring. Since the left hand side of the equation equals 0 when *x* = 1, the Factor Theorem tells us (5/2)*x*^{2} − (17/2)*x* + 6 must factor as *x* − 1 times some linear polynomial *ax* + *b*. That is, we should be able to find constants *a* and *b* so that (*x* − 1) (*ax* + *b*) expands to give (5/2)*x*^{2} − (17/2)*x* + 6. (Notice how we’ve flipped Descartes’ script: *a* and *b* are the unknowns, not *x*!) Expanding (*x* − 1) (*ax* + *b*) gives *ax*^{2} + (*b* − *a*) *x* − *b*, which is equivalent to (5/2)*x*^{2} − (17/2)*x* + 6 provided that *a* and *b* satisfy the three equations *a* = 5/2, *b* − *a* = −17/2, and −*b* = 6. We can solve the first and last equations to get *a* = 5/2 and *b* = −6 and then check that these values satisfy the middle equation as well. So (5/2)*x*^{2} − (17/2)*x* + 6 factors as (*x* − 1) ((5/2)*x* − 6), telling us that the second root satisfies (5/2)*x* − 6 = 0, or (5/2) *x* = 6, or *x* = 6 / (5/2) = 12/5.

#6. Wilf was one of the tallest mathematicians I ever met, and his book had one of the shortest titles of any math book I ever read as measured by word-count, but that’s only because he cheated: his book is called generatingfunctionology instead of something more boring like “Theory and applications of generating functions”. You’ll notice that this month I’m only talking about polynomials, but Wilf’s book also talks about power series, which is the topic of next month’s essay. I should mention that the kind of generating functions Wilf focuses on in his book encode (ordered) sequences, whereas the kind I’m writiing about here encode (unordered) sets. Incidentally, Wilf’s name, profession, and stature provided the inspiration for the minor Sesame Street character Herb Wolf.

#7. Wilhelm Gottfried Leibniz, co-inventor of the calculus, asserted in his “Opera Omnia” that when you roll two dice, you’re just as likely to roll a 12 as you are to roll an 11, on the grounds that each outcome can be achieved in exactly one way: the former as 6+6, the latter as 5+6. From this we learn two things: first, that even great mathematicians make mistakes, and second, that Leibniz didn’t spend much time gambling. If Leibniz had spent some of his youth in gambling dens, he would’ve learned (possibly by losing his puffy shirt a few times) that an 11 comes up a lot more often than a 12, and if he’d read the writings of Cardano, Pascal, and Fermat he would have tabulated the 36 equally likely outcomes of rolling two 6-sided dice and he would’ve been able to check that an 11 is exactly twice as likely as a 12. Part of the conceptual difficulty here is that when you roll two dice at the same time, as opposed to rolling a single die twice, it’s harder to see why an 11 corresponds to two separate outcomes. If one die is red and the other is blue, then we can distinguish red-5-and-blue-6 from red-6-and-blue-5, but if the dice are hard to tell apart, then there may not be an easy way for us to tell apart the two outcomes; and if we can’t tell the difference, it’s hard to believe that Tyche, the goddess of chance, should care. But she does!

#8. Yes, I know it’s the name of a Laurie Anderson song.

#9. There are other ways to understand this fact. For instance, take the six-by-six addition table and color entries black if the sum is even and white if the sum is odd. Since each row has three white entries and three black entries, there are equally many white entries as black entries in the table as a whole. Or: notice that we can divide the table into nine 2-by-2 blocks, each of which contains equal numbers of black and white squares.

#10. The generating function for a single roll is *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5}, so the generating function for the sum of five rolls is (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5})^{5}. Plugging in *x* = −1, we get (−1 + 1 − 1 + 1 − 1)^{5} = (−1)^{5} = −1, so the negative coefficients in the expansion collectively overpower (just barely) the positive coefficients; the sum of the five numbers we roll is ever-so-slightly likelier to be odd than even. On the other hand, (−1+1−1+1−1)^{6} = (−1)^{6} = +1, so on the sixth roll the balance of power shifts; the sum of six numbers is ever-so-slightly likelier to be even than odd.

#11. So far we’ve looked at generating functions with a single variable *x* (usually called an indeterminate rather than a variable in this context). But generating functions needn’t be limited to a single indeterminate. Consider for instance the expression (*x* + *y*)^{2}, which expands as *x*^{2} + 2*xy* + *y*^{2}. Writing this as 1*x*^{2} + 2*xy* + 1*y*^{2}, we find that this polynomial is the generating function for the sequence 1, 2, 1. Likewise (*x* + *y*)^{3} is the generating function for the sequence 1, 3, 3, 1; (*x* + *y*)^{4} is the generating function for the sequence 1, 4, 6, 4, 1; and so on. These sequences are the rows of the famous triangle of binomial coefficients attributed variously to Pingala (India), Yang Hui (China), Omar Khayyam (Iran), Tartaglia (Italy), and Pascal (France), that starts like this:

It’s a curious fact that if you alternately add and subtract the elements in any row of the triangle other than the top row, you end up with zero. This isn’t so surprising when the second entry in a row is an odd number like 3 or 5, because then positive terms and negative terms cancel in an obvious way (as in 1 − 3 + 3 − 1). But it’s less obvious why we should have 1 − 4 + 6 − 4 + 1 = 0 and 1 − 6 + 15 − 20 + 15 − 6 + 1 = 0 and so on. But generating functions once again can help us. Consider (*x* + *y*)^{4} =1*x*^{4} + 4*x*^{3}*y* + 6*x*^{2}*y*^{2} + 4*xy*^{3} + 1*y*^{4}. Replacing *y* by −*y*, we get

(*x* − *y*)^{4} =1*x*^{4} − 4*x*^{3}*y* + 6*x*^{2}*y*^{2} − 4*xy*^{3} + 1*y*^{4}.

If we now set *x* = *y* = 1, the left hand side of the inset equation becomes 0^{4}, or 0, while the right hand side becomes 1 − 4 + 6 − 4 + 1, the desired alternating sum. The same trick works for (*x* − *y*)^{n} for any positive integer *n*. (Note, though, that the alternating sum of the entries in the top row of the triangle is 1, not 0. This accords with the fact that there are many mathematical situations, especially in discrete mathematics, where it makes more sense to define 0^{0} to equal 1 rather than 0.) This can be used to show that if you toss a coin one or more times, the probability that the number of heads is even is exactly 1/2, as is the probability that the number of heads is odd.

#12. Sicherman took to styling himself as The Colonel as a joke and many people have referred to him in print as such, mistaking the nickname for a military title.

#13. The polynomial *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6} is actually a product of four irreducible polynomials (polynomials that cannot be factored further): (*x*) (1 + *x*) (1 + *x* + *x*^{2}) (1 − *x* + *x*^{2}). Combining the *x*, 1 + *x* + *x*^{2}, and 1 − *x* + *x*^{2} gives us the *x*^{1} + *x*^{3} + *x*^{5}; pairing up the 1 + *x* and 1 − *x* + *x*^{2} gives us the *x*^{0} + *x*^{3}, and pairing up the *x* and 1 + *x* + *x*^{2} gives us the *x*^{1} + *x*^{2} + *x*^{3}. It’s a theorem of advanced algebra that factorization of polynomials into irreducibles, like factorization of integers into primes, can be done in only one way. The polynomials 1 + *x*, 1 + *x* + *x*^{2}, and 1 − *x* + *x*^{2} are examples of cyclotomic polynomials; specifically, they are *Φ*_{2}(*x*), *Φ*_{3}(*x*), and *Φ*_{6}(*x*), where *Φ*_{n}(*x*) is the polynomial whose roots are precisely the primitive *n*th roots of 1 – that is, the complex numbers *z* that satisfy *z ^{n}* = 1 but don’t satisfy

#14. Since *A*(*x*) and *B*(*x*) are both divisible by *x*, it’s handy to pull out those factors of *x* and focus on the 5th degree polynomials *A**(*x*) = *A*(*x*)/*x* and *B**(*x*) = *B*(*x*)/*x*, and to write the equation as *A**(*x*) *B**(*x*) = (1/11) (1 + *x* + *x*^{2} + ··· + *x*^{10}). *A**(*x*) is a 5th degree polynomial, and every polynomial of odd degree has a real root, so there exists a real number *r* such that *A**(*r*) = 0, implying that *A**(*r*) *B**(*r*) = 0. If we had *A**(*x*) *B**(*x*) equal to (1/11) (1 + *x* + *x*^{2} + ··· + *x*^{10}), then, plugging in *x* = *r*, we’d get (1/11) (1 + *r* + *r*^{2} + ··· + *r*^{10}) = 0. Multiplying the equation by 11 and then by 1 − *r*, we get (after lots of cancellation) 1 − *r*^{11} = 0. So *r* must satisfy 1 − *r*^{11} = 0, and the only real number *r* with that property is *r* = 1. Since we assumed *A**(*r*) = 0, we must have *A**(1) = 0. But *A**(1) isn’t 0; in fact it’s *a*_{1} + *a*_{2} + ··· + *a*_{6}, which is 1, not 0.

This proof won’t work for *n*-sided dice when *n* is odd, but in this case complex numbers can come to our rescue. I’ll take *n* = 5 for definiteness. We are looking for polynomials *A*(*x*) =* a*_{1} *x*^{1} + *a*_{2} *x*^{2} + *a*_{3} *x ^{3}* +

We mathematicians have nobody but ourselves to blame, since it was one of our own (René Descartes) who saddled numbers like sqrt(−1) with the term “imaginary” and another mathematician (Carl-Friedrich Gauss) who dubbed numbers like 2+sqrt(−1) “complex”. Now it’s several centuries too late for us to ask everybody to use different words. But since those centuries have given us a clearer understanding of what these new sorts of numbers are good for, I can’t help wishing that, instead of calling them “complex numbers”, we’d called them — well, I’ll come to that in a bit.

Mind you, I totally get why sqrt(−1) got called imaginary. “sqrt(−1)” signifies a number *x* with the property that *x*^{2} = −1, but no respectable number behaves that way. A law-abiding number is positive, negative, or zero. If *x* is positive, *x*^{2} will be positive too. If *x* is negative, *x*^{2} will still be positive, since a negative number times a negative number is a positive number (see my essay “Going Negative, part 1” and other Mathematical Enchantments essays about negative numbers if you’re wondering why the product of two negative numbers is positive). And if *x* is zero, *x*^{2} will be zero. In none of the three allowed cases is *x*^{2} negative, so you can’t have *x*^{2} equal to −1. Sorry; it’s an impossible equation. And you might think that *that* would end the matter …

… except that five hundred years ago algebraists learned, to their astonishment, that expressions involving the square roots of negative numbers can be useful intermediate stages of certain calculations that have sensible final answers. So mathematicians grudgingly invited square roots of negative numbers into the house of mathematics but only through the back door, and chose nomenclature that would let those impossible square roots know in no uncertain terms that they were second-class citizens who could mix with other algebraic expressions but who needed to be out the door when their work was done, before respectable company arrived. (See Endnote #1.)

**MATH AND MYSTICISM**

Isaac Asimov, in his essay “The imaginary that isn’t” (from his book *Adding A Dimension*), describes an encounter he had with a sociology professor while he was an undergraduate in the 1930s. The professor sorted humankind into two groups, “realists” and “mystics”, and asserted that mathematicians belong to the latter camp because “they believe in numbers that have no reality.” When young Asimov asked him to explain, the professor cited the example of the square root of minus one, saying “It has no existence. Mathematicians call it imaginary. But they believe it has some kind of existence in a mystical way.” Asimov protested that imaginary numbers are just as real as any other kind of number, and the professor challenged the upstart to hand him the square root of minus one pieces of chalk, and … but I’ll break off the story there for now, because I like to imagine it going a different way: I like to imagine young Asimov asking, “So, where would you put electrical engineers in your classification of humankind? You know, people like Steinmetz?”

Any professor teaching in an American college in the 1930s would have known of Charles Proteus Steinmetz, even though he’s no longer a household name the way Edison and Tesla are. The “Wizard of Schenectady” was as responsible for the electrification of America as anyone else (arguably more than Edison, who had stubbornly insisted on trying to transmit direct current along power lines until Steinmetz and Tesla and their allies proved the superiority of alternating current). The sociology professor was undoubtedly teaching in a classroom that had artificial light in the ceiling run by electrical generators miles away, thanks to Steinmetz.

“Steinmetz built things. He was a realist,” the professor would have said.

“Oh?” Asimov could have replied. “Then why was he an evangelist for the square root of minus one?”

**THE WIZARD OF SCHENECTADY**

Steinmetz was in many ways Edison’s opposite, and not just because of their different ideas about how power should be transmitted across long distances. Edison was a commanding five foot ten; Steinmetz was only four feet tall. Edison gave his assistants (“muckers”, they were called) puny salaries; Steinmetz once refused to take a raise from his employer because he felt that his assistants weren’t paid enough. Edison had three biological children; Steinmetz never had any because he was determined not to pass on his genes for kyphosis and hip dysplasia, opting instead to adopt a younger colleague as his son and become a loving grandfather to the colleague’s children. But, like Edison, Steinmetz was a workaholic whose success in solving technological problems came in part from the fact that he devoted his life to them.

As a young man in Bismarck’s Prussia, Karl August Rudolph Steinmetz joined a fraternity that bestowed on him the nickname Proteus after the shape-shifting Greek sea-god from whose name we get the adjective “protean”. Later, his membership in a socialist student group got him in trouble with the authorities and he was forced to flee the country. As a disabled person with very little command of English, he was almost turned away at Ellis Island until a friend spoke up for him, exaggerating his talents in a way that his past accomplishments didn’t justify (but his future accomplishments would). Young Karl became Charles and adopted his former nickname as his legal middle name. He went to work for a friend, Rudolph Eickemeyer, and stayed at Eickemeyer’s company out of loyalty even after his early achievements got the attention of bigger companies. When General Electric offered him a large salary increase if he’d leave his friend’s company, Steinmetz was puzzled: what did salary have to do with the principle of loyalty? General Electric resolved the impasse by buying Eickemeyer’s company.

The late 19th century was an era of technological promise, much of it bound up in the transforming potential of electricity. The major problem was how to get electricity to all the different places where it could do its magic. Edison favored the straightforward approach of pushing electrons over wires from point A to point B, since electricity derives from the motion of electrons. But others favored the less intuitive idea of rocking electrons back and forth along a wire, alternately pushing and pulling. That kind of current, an alternating current, has many technological advantages (which is why it predominates today), but alternating current is harder to model mathematically because it’s dynamic in a way that direct current isn’t. Let’s take a look at that.

First, picture a simple direct current (DC) circuit containing a battery, a lightbulb, and two wires connecting the battery to the bulb in both directions. Three important quantities are the voltage *V* (the difference in electrical potential between the two ends of the wire), the current *I* (how many electrons are traveling along the wire), and the resistance *R* (how hard the electrons have to work to make the trip), and as long as the bulb doesn’t burn out, the quantities are constant. If you were to plot current and voltage as functions of time (with time on the horizontal axis and current or voltage on the vertical axis), you’d just see boring horizontal lines. Moreover, there’s a simple equation relating these three quantities, called Ohm’s Law: *V* = *I R*.

But in alternating current (AC) circuits, voltage and current vary over time, and there’s no such simple linear relationship between them. For instance, if you live in the U.S., an outlet labeled “120V” is actually giving you a voltage that oscillates between +170V and −170V; 120 is just the average over time (the “root-mean-square average“, for those of you who care). In the figure below I give a typical plot, showing voltage (the blue curve) and current (the gold curve) as functions of time in an AC circuit. Sometimes the voltage is increasing and the current is increasing, but sometimes the voltage is increasing and the current is decreasing. Both voltage and current follow the pattern of a sine wave, or “sinusoid”, but typically peak current doesn’t coincide with peak voltage; in most circuits there’s a mismatch, or “phase shift”. (See Endnote #2.) Gone is the simplicity of *V* = *I R*.

You can see this sort of phase-shift in action if you watch a kid on a swing (after they stop pumping) and you simultaneously attend to position and speed, or more precisely, deflection from the vertical and angular velocity. When the kid is as far to the right as possible, their speed is (instantaneously) zero. When the kid has swung back down to the lowest point of their trajectory (where the deflection is zero), the motion is leftward and the speed is at its maximum. When the kid is as far to the left as possible, their speed is again zero. When the kid returns to the zero-deflection point, the motion is rightward and the speed is at its maximum. And so on. The instant of maximum rightward deflection comes after the instant of maximum rightward speed, specifically 1/4 of a cycle later.

If you were to plot, at each instant, a point whose *x*-coordinate is the deflection of the swing (positive when the swing is on the right, negative when the swing is on the left) and whose *y*-coordinate is the angular velocity of the swing (positive when the swing is moving rightward, negative when the swing is moving leftward), the moving point would trace out a circle, as shown in the illustration. The circle doesn’t exist in physical space; rather, it exists in a notional “phase space”, in which the vertical axis is for the* velocity* of the swing, not its position. (Here I’m ignoring some niceties about pendular motion; specifically, its deflection-velocity plot is only approximately circular, and only when the amplitude is small. And as every kid who’s been on a swing knows, it’s the big deflections that are the fun part!)

The mathematics of circular motion is usually described using trigonometric functions, and indeed one can describe current and voltage in alternating-current circuits using sines and cosines, but the formulas can get quite hairy. What Steinmetz realized is that some seemingly pure math he’d learned in his student days could make the formulas much simpler. (See Endnote #3.)

**PURE IMAGINATION**

Mathematicians had played with imaginary and complex numbers long before their games had any real-world applications. One of the mathematicians who played the hardest was Leonhard Euler, who in 1777 introduced the symbol “*i*” to signify the square root of minus one. Euler operated on the assumption that whatever “*i*” might be, it should satisfy the ordinary rules of algebra. So for instance 2*i* times 3*i* should be

(2*i*)×(3*i*) = (2)×(*i*)×(3)×(*i*) = (2)×(3)×(*i*)×(*i*) = (2×3)×(*i*×*i*) = (6)×(−1) = −6

and 1+*i* times 1+*i* should be

(1+*i*)×(1+*i*) = 1×1 + 1×*i* + *i*×1 + *i*×*i* = 1 + *i* + *i* + −1 = *i* + *i* = 2*i*

(where the first equality is an application of the distributive law or if you prefer “FOIL“; but see Endnote #4).

Later, the mathematicians Jean-Robert Argand, Caspar Wessel and Carl-Friedrich Gauss independently came up with a visual way to represent complex numbers. You draw a horizontal axis for the real numbers and a vertical axis for the imaginary numbers meeting at a point called the origin, and you depict the complex number *a*+*bi* by a point that’s *a* units to the right of the origin and *b* units up from the origin, as shown for the complex numbers 2+*i*, 3+*i*, and 5+5*i*. (If *a* is negative, go left instead of right; if *b* is negative, go down instead of up.) Note by the way that the origin represents the complex number 0+0*i*, whch is simultaneously real and imaginary. My daughter, shortly after learning all this, exclaimed “Wait, so zero has been a complex number under my nose this whole time?” Absolutely!

The wonderful thing about the definition of complex number multiplication is the geometry that’s hiding inside it (exactly the kind of shape-shifting geometry Steinmetz needed). Suppose a specific complex number *a*+*bi* other than 0+0*i* is represented by the specific point *P* in the plane in the manner described above. Let *O* stand for the origin (where the axes cross, aka 0+0*i*), and let *N* be the point on the horizontal axis that corresponds to the complex number 1+0*i*, aka the real number 1. We define the “magnitude” of the complex number *a*+*bi* as the length of segment *OP*, and we define the “phase” or “angle” of the complex number *a*+*bi* as the measure of angle *NOP* (some people call the phase the “argument” but I won’t). For example, when *a*=*b*=1, triangle *NOP* is an isosceles right triangle with legs of length 1, so the magnitude of 1+*i* is sqrt(2) and the phase of 1+*i* is 45°.

The miracle is that if we define multiplication of complex numbers in the way that the ordinary rules of algebra force us to, then **magnitudes multiply and angles add**! For instance, look at 2*i* times 3*i*. 2*i* has magnitude 2 and phase 90°, and 3*i* has magnitude 3 and phase 90°; the product of 2*i* and 3*i*, namely −6, has magnitude 6 = 2 × 3 and phase 180° = 90° + 90°. Or look at 1+*i* times 1+*i*. 1+*i *has magnitude sqrt(2) and phase 45°; its product with itself, namely 2*i*, has magnitude 2 = sqrt(2) × sqrt(2) and phase 90° = 45° + 45°. (A puzzle for some of you: can you show that the “multiplication miracle” holds when the two complex numbers being multiplied are 2+*i* and 3+*i*? See Endnote #5.)

Multiplying a complex number by −1 has the effect of leaving the magnitude alone while rotating the corresponding point halfway around the origin. (In fact, the rule for multiplying complex numbers gives us a new way to understand the rule for determining the sign of the product of two real numbers; see Endnote #6.) In a similar way, multiplying any complex number by *i* has the effect of rotating the corresponding point a quarter of the way around the origin, in the counterclockwise direction; see Endnote #7. Steinmetz realized that the mathematics of multiplication by *i* was a very crisp way of representing the physics of a 90 degree phase shift. (See Endnote #8.) He couldn’t use the letter *i* because electrical engineers were already using *I* to represent current flow (recall our earlier equation *V* = *I R*), so Steinmetz chose to use *j* instead, and to this day many electrical engineers use *j* instead of *i* to signify the square root of −1.

The sociology professor’s mistake (back in Asimov’s story) lay in part in thinking that mathematics is only about static quantities, such as the number of pieces of chalk you’re holding in your hand. But math can also be about things that, like mythical Proteus, keep changing. Take the professor’s chalk and drag it against a blackboard, at an angle that makes a squeaky sound some find painful, and you have an oscillating physical system that could be described by sines and cosines but might better be described through the use of complex numbers. Indeed, the 90 degree phase shift (of which the complex number *i* is the numerical thumbprint) is ubiquitous in physics. I’ve already mentioned electrical circuits and kids on swings, but there are lots of other examples.

Real numbers have magnitude and sign; analogously, complex numbers have magnitude and phase. That’s why I wish complex numbers had been dubbed “phased numbers”. (See Endnote #9.) Real numbers are phased numbers whose phase is either 0 degrees (for positive real numbers) or 180 degrees (for negative real numbers). Likewise, imaginary numbers are phased numbers whose phase is either 90 degrees or 270 degrees.

I should stress that Steinmetz did not experimentally discover that the flow of electrons had a hitherto unnoticed imaginary component — he merely showed that the mathematical formalism of electrical engineering becomes simpler if we *pretend* that the current that we measure is but the shadow, along the real line, of a quantity whose true home is the complex plane. The gif Another way of looking at sine and cosine functions (created by Christian Wolff; permission pending) illustrates this in a lovely way. A green point moves in a circle in the *x*,*y* plane. Its projection onto the *x*,*z* plane gives a blue sinusoid, while its projection onto the *y*,*z* plane gives a red sinusoid that is 90 degrees out of phase with the first one. In our analogy, one of these sinuoids is real current (or real voltage) while the other sinuoid is imaginary current (or real voltage). If we apply this point of view to our AC circuits, then we can revive the equation *V* = *I R* by reinterpreting all three quantities as complex numbers. *V* now represents the complex voltage, *I* now represents the complex current, and *R* gets replaced by a complex number called “impedance” that extends the concept of resistance. (See also Endnote #10.) The linear relation between current and voltage, so handy in the study of direct current circuits, has been restored! And the only price we had to pay was to graduate from the real number line to the complex number plane.

The mathematics of back-and-forth in one dimension is best expressed in terms of the mathematics of round-and-round in two dimensions. For instance, when you spin this disk clockwise (see Endnote #11), the vertical coordinates of the blue and gold points match up with the behavior of the blue and gold curves I used to illustrate the behavior of voltage and current in an AC circuit. And as the disk spins about its center, the ratio of the gold point to the blue point remains fixed if we view the two points as complex numbers, since the ratio of their magnitudes stays 4-to-3 while their phase-lag stays 90 degrees.

(If any of you reading this are good at creating animations, please let me know; I’d love to be able to include a gif that shows the spinning disk and makes the connection with those sinusoids “pop”!)

Since complex currents and complex voltages are useful fictions, not scientific facts, perhaps the sociologist was right to call this way of thinking about the world “mystical”. But if so, Steinmetz was an extremely unusual and useful kind of mystic: not the kind who makes occult pronouncements about the spirit plane but the kind who, invoking a different sort of plane, brings about a world in which it’s easier to make toast.

**THE GREAT UNIFICATION**

If you’re impatient to get back to Asimov’s sociology professor and find out what really happened in that classroom, you might want to skip this section. But I can’t resist giving you a peek into what mathematicians did with Euler’s *i*, starting with Euler himself. (This is the stuff Steinmetz would have learned as a math major at the University of Breslau.)

The best thing Euler did with the number *i* was discover the equation

*e ^{iθ}* = cos

where *e* is the constant 2.718… discovered by Jacob Bernoulli but often called Euler’s number, cos is the cosine function, and sin is the sine function. (For more about *e*, watch the 3Blue1Brown video “What’s so special about Euler’s number e?”, and for more about Euler’s amazing formula, watch the 3Blue1Brown video “What is Euler’s formula actually saying?”.) What makes this equation astonishing is that the left and right sides of this equation come from different worlds. The left side is an exponential function (if we leave aside the suspicious circumstance of the exponent being an imaginary number), and therefore points at phenomena like compound interest, population growth, radioactive decay, and the initial spread of novel pathogens. (It was indeed the application of exponential functions to banking that led Bernoulli to discover *e* in the first place.) Meanwhile, the right hand side (again ignoring the *i*) features two functions, sine and cosine, introduced thousands of years ago for surveying land, navigating seas, and plotting the paths of planets. It would seem that the compounding of interest has little to do with the motions of heavenly bodies, yet Euler’s formula tied them together intimately, showing them to be two different aspects of a single mathematical phenomenon. We celebrate Newton’s unification of terrestrial ballistics with the motion of the Moon, and Maxwell’s unification of electricity, magnetism, and light, but we don’t say nearly enough about how Euler’s discovery built a secret passageway that links numerous disciplines within mathematics and far beyond.

The complex number *e ^{iθ}* always lies on the circle of radius 1 centered at 0. If we want to talk about other kinds of nonzero complex numbers, we use the representation

One picayune but useful consequence of Euler’s monumental discovery is that you don’t have to memorize many trig formulas once you know how to traverse the passageway between the world of exponential functions and the world of trig functions; see Endnote #12. You can also use complex numbers to get algebraic proofs of certain geometric facts (see Jim Simons’ video “Three pretty geometric theorems, proved by complex numbers”) and to find nice solutions to combinatorial puzzles (see Endnote #13 as well as the 3Blue1Brown video “Olympiad-level counting”) and sometimes to reduce nasty-looking geometric optimization problems to manageable complexity (see Endnote #14).

More profound applications of the complex numbers turned up in 19th century mathematics, especially Bernhard Riemann’s work in number theory, leading French mathematician Paul Painlevé to write “Between two truths of the real domain, the easiest and shortest path quite often passes through the complex domain.” (The saying was popularized by Jacques Hadamard through his book *The Psychology of Invention in the Mathematical Field*, in which he prefaced the adage by “It has been written…” without acknowledging Painlevé as the source.)

Even though the advent of complex numbers led to new beginnings in many branches of mathematics, in an important way, it was an ending too. Earlier math had been full of equations whose solutions seemed impossible but which led to new kinds of numbers. Want to solve 2*x* = 1? Invent fractions. Want to solve *x*+2 = 1? Invent negative numbers. Want to solve *x*^{2} = 2? Invent irrational numbers. Want to solve *x*^{2} = −1? Invent imaginary numbers. You might think we could keep at this game forever, writing down impossible equations and then inventing new numbers to render the impossible possible.

For instance, what about the equation *x*^{2} = *i*? You might think we need to go beyond the complex numbers to solve it. But we don’t, and if you remember how multiplication of complex numbers works, it’s not hard to figure out where the square root of *i* is hiding in the complex plane: it has magnitude 1 and phase 45°. That is, it’s *r* + *ir* where *r* = sqrt(2). (There’s also a square root of *i* with phase 225° on the other side of the origin.) So we can solve *x*^{2} = *i* in the complex number system without us having to bring new numbers into the game.

This is just one example of a theorem that’s so important that it’s called the Fundamental Theorem of Algebra: if you write down an equation (more specifically a polynomial equation) involving a single unknown number *x*, that equation (unless it reduces to something silly like *x* = *x*+1) will always have a solution in the system of complex numbers. So you could say that with the advent of complex numbers, the discipline of algebra, after many centuries of wandering and struggle, had finally found its true home.

**WHAT IS REAL?**

But wait — I sense a presence in the room: it’s the spirit of Asimov’s professor, and he’s downright gleeful. “All you’ve proved is that a spirit of mysticism has infected the world of science, thanks to the closet-mathematician Steinmetz and other traitors to Reality!” And (though it’s galling to have a ghost lecture me about mysticism versus realism) I have to admit he has a point. Steinmetz identified as a mathematician in his younger years, before he came to America and switched to engineering. And the “infection” has continued to spread.

At first the use of complex numbers was confined to branches of physics that studied wavelike phenomena. If you want to understand how light works in classical optics, you need to think of a photon as a kind of self-sustaining feedback loop between an electrical oscillation and a magnetic oscillation, propagating through space. To understand this screwlike motion, you need the twisty mathematics offered by complex numbers.

Then in the first half of the 20th century came the quantum revolution. Physicists came to realize that elementary particles (and to some extent objects made up of those particles, even including macroscopic ones like pieces of chalk) had a wavelike aspect, and that certain phenomena could only be understood if you treated complex numbers not just as a useful fiction but as part of the bedrock of reality.

An elementary particle, viewed as a wave, has a phase, and we can experimentally measure how particles’ phases change when they interact. Probabilities don’t just add; sometimes they cancel, interfering destructively the way waves do. (See Endnote #15.) Quantum physics has phase baked into its structure at the smallest scales that our current theories can reach. It’s not just light that behaves in a screwy way; quantum physics asserts that the whole damned universe is screwy, which is why we need twisty mathematics to describe it.

There’s another sense in which the sociology professor was sort of right (though several centuries behind the times): complex numbers did arise from an approach to math that renounced the physical world and even common sense. The 16th century algebraist Gerolamo Cardano, after deriving the complex roots of the “impossible” equation *x*^{2}−10*x*+40=0, declared his own analysis to be “as subtle as it is useless”. Rafael Bombelli, building on Cardano’s work, made complex numbers more respectable by giving clear and consistent rules for operating with them, but he never attempted to explain what complex numbers *were*. (See Endnote #16.)

Hiding in the background of Bombelli’s work was the radical notion (announced more overtly in 19th century England) that if you give clear and consistent rules for operating with fictional quantities, then you can study those fictional quantities on their own terms as elements of a notional number system, deferring or dismissing the question of what those quantities actually mean. This gives license to a sort of “mysticism” in which mathematicians create new number-systems simply by specifying rules of operation, not worrying about whether or how these number-systems correspond to anything in the real world. Maybe there’ll be an application in a hundred years, or a thousand, or never; who knows? In the meantime, there’s plenty of exploring to do.

In Asimov’s anecdote, when the professor challenges Asimov to hand him the square root of minus one pieces of chalk, the brash undergrad says he’ll do it if the professor first gives him one-half of a piece of chalk. When the professor breaks a piece of chalk into two pieces and gives one to Asimov, saying “Now for your end of the bargain,” Asimov points out that what he’s been handed is a single (smaller) piece: *one* piece of chalk, not one-half. The professor counters that “one-half a piece of chalk” means one half of a *standard* piece of chalk, and Asimov asks the professor how he can be sure that it’s exactly half, and not, say, 0.48 or 0.52 of a standard piece of chalk.

What I take away from the end of Asimov’s story is that the difference between a “concrete” number like one-half and an “abstract” number like the square root of minus one is a difference in degree, not a difference in kind. Both are useful fictions. The fictional aspect of one-half comes into view when we notice that the professor’s attempt to hand Asimov half a piece of chalk depends on both a societal agreement on what a standard piece of chalk is and a societal agreement about how much error is permitted. The latter is a bit hazy; where do we draw the line between dividing something in half and dividing it into two unequal pieces? Come to think of it, I’m sure there are measurable differences between the different pieces of chalk that come out of a chalk factory. Quality control doesn’t require that the differences be indiscernible. So the definition of a “standard” piece of chalk is a bit fuzzy too.

Of course I’m splitting hairs here, and ordinary conversation demands adherence to a largely unspoken agreement about which hairs to leave unsplit. And that indeed is my point. Even a seemingly simple mathematical concept like one-half is a collaboration between the universe and a society of minds observing the universe — just like the square root of minus one.

Or to put it more succinctly: Real numbers are more imaginary than most people realize, and imaginary numbers are more real than most people imagine.

*Thanks to Richard Amster, Sidney Cahn, Jeremy Cote, Sandi Gubin, Henri Picciotto, Tzula Propp, and Paul Zeitz.*

**REFERENCES**

Titu Andrescu and Zuming Feng, *102 Combinatorial Problems From the Training of the USA IMO Team*, 2003.

Isaac Asimov, *Adding A Dimension*, 1964.

Floyd Miller, *The Man Who Tamed Lightning: Charles Proteus Steinmetz*, 1965.

Paul Nahin, *An Imaginary Tale: The Story of sqrt(−1)*, 1998.

David Richeson, The Scandalous History of the Cubic Formula, Quanta Magazine, 2022.

Danny Augusto Vieira Tonidandel, Steinmetz and the Concept of Phasor: A Forgotten Story, 2013.

Paul Zeitz, *The Art and Craft of Problem Solving*, 2006.

**ENDNOTES**

#1. Specifically, one method for solving the equation *x*^{3} = 15*x* + 4 involves writing the solution in the form

before rewriting it as

cancelling the two impossible terms of opposite sign, and concluding (correctly) that *x*=4 solves the problem. See the Veritasium video “How Imaginary Numbers Were Invented” as well as David Richeson’s Quanta Magazine article listed in the References for more on this.

#2. This picture is potentially misleading since current and voltage are measured in different units; superimposing them has no physical meaning. However, it’s still a helpful way to compare the phases of the two quantities. In the illustration, current lags behind voltage by one-quarter of a cycle, which is what happens when your only circuit elements are capacitors. The phase shift when your only circuit elements are inductors is also one-quarter of a cycle, but in the opposite direction, with voltage lagging behind current. For circuits that contain both capacitors and inductors, things get complicated; more specifically, as Steinmetz noticed, they get complex!

#3. The two functions I chose for the voltage and current in my figure depicting alternating current were 4 cos *t* and 3 sin *t*, two sine waves that are 90 degrees out of phase. Ignoring the fact that one of them represents a voltage and the other represents a current, let’s add them. Here’s a graph of the function 4 cos *t* + 3 sin *t*:

Notice that we get another sine-wave, but it’s in phase with neither 4 cos *t* nor 3 sin *t*. Interestingly, if you were to measure the amplitude of this function — the sum of a sine wave of amplitude 4 and a sine wave of amplitude 3 — you’d find that it’s exactly 5. And if you suspect that this equality has something to do with the 3-4-5 right triangle, then (just like the triangle) you are right! The crucial fact is that the two sine waves being combined were exactly 90 degrees out of phase with each other. If we’d added two sine waves that were in phase with each other, one of amplitude 4 and the other of amplitude 3, we’d get a sine wave of amplitude 4+3=7 because of constructive interference. In the opposite case, where the waves are 180 degrees out of phase with each other, we’d get a sine wave of amplitude 4−3=1 because of destructive interference. And in the intermediate case, where the waves are 90 degrees out of phase with each other, we get a sine wave of amplitude 4+3*i* or rather sqrt(4^{2}+3^{2}) = 5 (the magnitude of 4+3*i*). Sine waves, unlike pieces of chalk in a classroom, can interfere with each other constructively or destructively or in an intermediate manner.

#4. Some computer programmers who implement complex number arithmetic use a slight variant of the formula. On computers, multiplications are more time-consuming (or as one says “expensive”) than additions, so one often focuses on reducing the number of multiplications even if the number of additions is increased. Let *E* = *ac*, *F* = *bd*, and *G* = (*a*+*b*)(*c*+*d*). Then clearly *E*−*F* is *ac*−*bd* and you can check that *G*−*E*−*F* is *ad*+*bc*. So a computer can calculate the real and imaginary parts of the product of two complex numbers using just three real multiplications rather than the obvious four real multiplications.

#5. The magnitudes of 2+*i* and 3+*i* are respectively sqrt(2^{2}+1^{2}) = sqrt(5) and sqrt(3^{2}+1^{2}) = sqrt(10), whose product is sqrt(50), which is the magnitude of 5+5*i*, which is the product of 2+*i* and 3+*i*. Likewise, using some trigonometry you can show that if *𝛼* is the phase of 2+*i* and *𝛽* is the phase of 3+*i*, then 𝛼 plus 𝛽 is exactly 45 degrees, which is the phase of 5+5*i*. One way to prove this is to use the tangent addition formula: we know that tan *𝛼* is 1/2 and tan *𝛽* is 1/3, so tan 𝛼+𝛽 is

implying that * 𝛼*+

#6. Positive numbers have phase 0 degrees and negative numbers have phase 180 degrees. So the rule for the sign of a product of real numbers as embodied in the table

is essentially the same as the rule for adding angles that are multiples of 180 degrees as embodied in the table

#7. I’ve heard of an intriguing bit of kinesthetic pedagogy that Michael Pershan and Max Ray-Riek developed as a way of informally introducing middle-schoolers to complex numbers. Kids arranged on a number line can be led to invent 90 degree rotation as a choreograpic enactment of multiplication by the square root of −1! Visit Henri Picciotto’s page Kinesthetic Intro to Complex Numbers to learn more. Teachers and students may also find other things of interest at Picciotto’s more advanced page for complex number pedagogy.

#8. Like most stories that shine a spotlight on a single pioneering innovator, my story leaves out a lot. Steinmetz wasn’t the first or only person to suggest using complex numbers to understand electrical circuits involving alternating current; several people had the idea independently at about the same time. But Steinmetz was the chief proponent of this method in the U.S., and in his writings he compellingly demonstrated its virtues.

#9. Gauss called numbers like 2+3*i* “complex” because of the way they are compounded of a real part (2) and an imaginary part (3*i*). This terminology stresses the additive side of complex numbers, that is, the way you can build them up by adding simpler components together. But that doesn’t tell us anything interesting about what complex numbers are or what they’re good for. Vectors (which we’ll meet in a later essay) are also built by adding simpler components together. For that matter, I could introduce “fruity numbers” like “2-apples-and-3-bananas”, and I could say things like “2-apples-and-3-bananas plus 5-apples-and-7-bananas equals 7-apples-and-10-bananas”, and I could represent fruity numbers using two-dimensional diagrams; then fruity numbers would look a lot like complex numbers and they’d behave the same way vis-a-vis addition, but they’d be very different from complex numbers. What’s distinctive about the complex numbers (as opposed to the fruity numbers) is the specific, meaningful way in which one can multiply them: when you multiply two complex numbers, the phases get added. That’s why I think “phased numbers” is a better name for them.

#10. Going back to the example with *V*(*t*) = 4 cos *t* and *I*(*t*) = 3 sin *t* (the first picture in the essay), we find that the complex voltage 4 *e ^{it}* is equal to the complex current 3

#11. The convention of trigonometry is that counterclockwise is the *positive* direction and clockwise is the *negative* direction. I suppose we mathematicians could try to convince the world to use clocks that go the other way, but it’s a hard sell; I expect we’ll have to wait a very long time before this happens.

#12. See for instance the video “Double Angle Identities Using Euler’s Formula“. Some may say “That’s a lot of algebra; isn’t it easier just to look it up?”, but I’m not the only mathematician I know who’d rather re-derive a trig identity via complex exponentials.

#13. Here’s a puzzle of mine that Paul Zeitz used in his book of contest problems. Given a circle of *n* lightbulbs, exactly one of which is initially on, you’re allowed to change the state of a bulb (on versus off) provided you also change the state of every *d*th bulb after it (where *d* is a divisor of *n* other than *n* itself), provided that all *n*/*d* of the bulbs were originally in the same state as one another (that is, all on or all off). For what values of *n* is is possible to turn all the lights on by making a sequence of moves of this kind?

For example, take *n*=12. We have 12 lights in a circle, one of which is on. You’re allowed to toggle 2, 3, 4, 6, or 12 bulbs from off to on (provided that they’re evenly spaced around the circle), and you’re also allowed to toggle 2, 3, 4, 6, or 12 bulbs from on to off (provided that they’re evenly spaced around the circle). Taking as many moves as you need, can you get all the lights to be on? If that’s too hard, can you get all the lights to be off? Or if that’s still too hard, can you get there to be exactly one light on, but it’s a different light than the one that was on at the start?

If you like puzzles, this may be a good time to stop reading and think for a bit.

All of these tasks are impossible, and not just for *n*=12. And we can prove it with complex numbers, provided we know one key fact: if you have two or more evenly-spaced points on the circle of radius 1 in the complex plane, their sum is 0. I won’t prove the fact here, but let’s see how it shows that the lights puzzle can’t be solved. The trick is to look at the *sum* of the positions of the bulbs that are on, using complex number addition. The sum starts out being nonzero because exactly one light is on and (in the original version) the sum is supposed to end up being zero because all the lights should end up being off when we’re done. But anytime you turn a bunch of lights on, they’re evenly spaced, so the sum of their positions is zero, which means that when you turn those lights on you’re not affecting the sum of the turned-on lights. Likewise, when you turn a bunch of lights off, they’re evenly spaced, so the sum of their positions is again zero, and when you subtract zero from a complex number, you don’t change it.

#14. Here is a special case of a new problem I call the “repelling propellers problem”. I place blue dots at the 12 o’clock, 3 o’clock, 6 o’clock, and 9 o’clock positions on a circle. I want to place three red dots on the circle as well, 120 degrees apart from one another, in such a way as to maximize the product of all twelve of the red-point-to-blue-point distances. How do I do it? It looks nasty; there are twelve point-to-point distances to be multiplied, and each of them will be something like a trig function or involve a square root if we adopt a straightforward approach. But complex numbers yield a nice solution. Here again, you might want to stop reading and think on your own for a bit (though you’ll need to know some things about complex numbers that aren’t explained in my essay).

Let the circle in question be the unit circle in the complex plane, so that the blue points are at 1, *i*, −1, and −*i*. Let *ω* be cis 120° so that *ω*^{2} is cis 240°; if *z* represents the position of one red point, the other red points are at *ωz* and *ω*^{2}*z*. (You can think of the seven points as the tips of propellers that rotate around 0, with a repelling force between the red propeller tips and the blue propeller tips.)

The distance between two complex numbers *𝛼* and

Since the magnitude of a product of complex numbers equals the product of the magnitudes and vice versa, we can rewrite this expression as the magnitude of the product

But this product (call it *P*(*z*)) can be written in a much simpler way. To see how, consider the values of *z* that make the product equal to 0. Go back to the geometrical problem but consider the reverse desideratum: if you want to make the product of the twelve distances as *small* as possible, the best places to put those red dots are at 12 o’clock, 1 o’clock, 2 o’clock, …, 10 o’clock, and 11 o’clock, because for all of those positions, there’ll be a red dot and a blue dot that coincide and so are at distance zero from each other, which makes the product of the twelve distances all equal to zero as well. That means that the twelve complex numbers cis 0°, cis 30°, cis 60°, …, cis 300°, and cis 330° are all roots of the degree-twelve polynomial *P*(*z*). But those complex numbers are just the roots of the equation * z^{12}* − 1 = 0. So

We’re making progress here, and though you may be worried about the fact that I haven’t worked out what *C* is, you’ll soon see that we don’t need to know it. |*P*(*z*)|, the quantity we need to maximize, is |*C* (*z*^{12} − 1)|, which equals |*C*| |*z*^{12} − 1|, and |*C*| is constant, so all we’re really trying to do is maximize |*z*^{12} − 1|, which is the distance between *z*^{12} and 1. That is, we want to find a point *z* on the circle of radius 1 that makes *z*^{12} as far from 1 as possible. But as *z* varies over the circle of radius 1, so does *z*^{12}, and the point on this circle that’s as far from 1 as possible is the point −1. So we need *z*^{12} = −1, which we can achieve (for instance) with *z* = cis 15°, placing red dots at 2:30, 10:30, and 6:30. In fact, if you place a red dot halfway between any two consecutive hour-marks, and place the other two dots accordingly, you’ll get one of the four dot-configurations that maximizes the product of the red-to-blue distances. (If you’re curious, placing the dots in this way makes the product of all twelve of those distances equal to exactly 2. But my problem didn’t ask you to figure that out.)

In a more general version of the problem, there are two propellers, one with *p* evenly spaced blades and one with *q* evenly spaced blades. If you can solve the *p*=3, *q*=4 case, you shouldn’t find the general case much harder. An even more general version of the problem features more than two propellers; I don’t know a general solution.

#15. Freeman Dyson, in his article “Birds and Frogs”, published in 2009 in the *Notices of the American Mathematical Society* (volume 56, pages 212–223) wrote: “Schrödinger … started from the idea of unifying mechanics with optics. A hundred years earlier, Hamilton had unified classical mechanics with ray optics, using the same mathematics to describe optical rays and classical particle trajectories. Schrödinger’s idea was to extend this unification to wave optics and wave mechanics. Wave optics already existed, but wave mechanics did not. Schrödinger had to invent wave mechanics to complete the unification. Starting from wave optics as a model, he wrote down a differential equation for a mechanical particle, but the equation made no sense. The equation looked like the equation of conduction of heat in a continuous medium. Heat conduction has no visible relevance to particle mechanics. Schrödinger’s idea seemed to be going nowhere. But then came the surprise. Schrödinger put the square root of minus one into the equation, and suddenly it made sense. Suddenly it became a wave equation instead of a heat conduction equation. And Schrödinger found to his delight that the equation has solutions corresponding to the quantized orbits in the Bohr model of the atom. It turns out that the Schrödinger equation describes correctly everything we know about the behavior of atoms. It is the basis of all of chemistry and most of physics. And that square root of minus one means that nature works with complex numbers and not with real numbers. This discovery came as a complete surprise, to Schrödinger as well as to everybody else.”

#16. Bombelli’s *Algebra* wasn’t just the first text to explain the rules governing complex numbers; it was also the first clear European treatment of the rules governing negative numbers. Of course, Chinese and Indian mathematicians already knew about negative numbers and how to work with them, but they hadn’t tried taking square roots of negative numbers as far as I’m aware. Then again, the Indian mathematician Brahmagupta came up with a formula that in some ways foreshadows the discovery of complex numbers. Remember when I said that when you multiply two complex numbers, their magnitudes get multiplied? Write those two complex numbers as *a*+*bi* and *c*+*di*, so that their product is (*ac*−*bd*)+(*ad*+*bc*)*i*. The magnitudes of these three complex numbers are sqrt(*a*^{2}+*b*^{2}), sqrt(*c*^{2}+*d*^{2}), and sqrt((*ac*−*bd*)^{2}+(*ad*+*bc*)^{2}). So my assertion about how magnitudes multiply becomes the formula sqrt(*a*^{2}+*b*^{2}) sqrt(*c*^{2}+*d*^{2}) = sqrt((*ac*−*bd*)^{2}+(*ad*+*bc*)^{2}), which if you square both sides becomes the simpler but still surprising formula (*a*^{2}+*b*^{2})(*c*^{2}+*d*^{2}) = (*ac*−*bd*)^{2}+(*ad*+*bc*)^{2}, true for all real numbers *a*,*b*,*c*,*d*. This formula tells us that if two positive integers can each be written as a sum of two perfect squares, so can their product. Brahmagupta knew this formula (and others like it), but he couldn’t have known that a thousand years later it would play a role in the study of complex numbers!

You could say that the source of their eventual breakup was present from the start, when they put together a model of the universe at their wedding. It was a sweet but, in hindsight, naive gesture. You see, Math had discovered that there were exactly five ways of sticking identical regular polygons together to form perfectly symmetrical solids (we humans named these regular shapes “Platonic solids” in honor of the philosopher who officiated at the wedding, though he didn’t discover any of them); delighted by the discovery, Math brought the five solids to the ceremony, as gifts for her bride-to-be. Meanwhile, Phyz brought her own gifts: earth, air, fire, water, and “quintessence” (heaven-stuff), the five elements from which she said the universe was constructed. (See Endnote #1.) Five regular solids? Five elements? Surely this marriage was foreordained! (See Endnote #2.) Math and Phyz exchanged gifts and proclaimed their bond, swearing that they would never part. And if any onlookers thought the correspondence between the gifts was forced, they had the good manners to keep their mouths shut.

But after a few millennia, latent tension in the relationship rose to the surface. Physics kept growing and changing, revising her core principles, sheepishly deciding for instance that earth, air, fire, and water weren’t true elements after all. But Math couldn’t help noticing that even as Phyz discovered new elements, Math didn’t have to update her inventory of regular solids. She had in fact found a proof that there couldn’t be any more, and the proof remained valid down the centuries, even as Phyz kept revising her own basic tenets. Oh, and here’s another example: Physics said that projectiles rise in a straight line before falling along a curve, until she said oops, no, they rise along a curve too. Math was embarrassed by the flightiness and unreliability of Phyz, even as Phyz was embarrassed by the stodginess of Math.

Over time Math became more fussy and equivocal. She began to hedge her statements, refusing to say what was true, but merely making conditional assertions of the form “Well, *if* assumptions A, B, and C are true, *then* conclusions X, Y, and Z follow.” Or: “*To the extent that* assumptions A, B, and C are approximately true, *to that same extent* conclusions X, Y, and Z should hold as approximations as well.” Though she hated the way she sounded when she said things like that.

But you shouldn’t think that Math was merely retreating into wishy-washiness or sterile perfectionism. Math was growing just as much as Physics was, but in different ways. And it wasn’t that Math lacked commitment to her relationship with Physics; she just felt too confined by where Phyz lived. Eventually, sometimes around 1900, she said “I need to see different universes,” and she moved out.

**HIGHER DIMENSIONS**

One issue that highlights the divide between Math and Physics is the issue of higher dimensions. Do they exist? Math and Physics have very different answers. In physics, the most naive (and mostly right) answer is “No”: you can’t construct an object with four lines that are at right angles to one another. (Of course, you can change the question and then the answer becomes “Yes”, and then you can change it again and the answer becomes “Maybe”, but I’ll get back to that shortly.) On the other hand, in mathematics, we can lay down axioms for *n*-dimensional Euclidean geometry not just for *n*=2 and *n*=3 but for any positive integer *n*. From these axioms, consequences can be derived, and every mathematician will obtain the same consequences, so higher-dimensional spaces are as real as any other mathematical construct: they’re consistent creations of the human mind with properties that all logical minds will assent to not because the axioms are true (whatever that would mean!) but because the entities under discussion satisfy the axioms by definition. Mathematics nowadays is a language for describing possible universes, of which the universe that we happen to inhabit is just one example.

Turning away from my conceit of Mathematics and Physics as personified beings and turning towards a consideration of human history, consider the careers of the mathematician Ludwig Schläfli (1814-1895) and the physicist Albert Einstein (1879-1955). Schläfli wanted to know what sorts of higher-dimensional regular solids (“regular polytopes” is the more technically correct phrase) exist in *n*-dimensional Euclidean space for values of *n* bigger than 3. He showed that there are *six* regular solids in 4-dimensional Euclidean space but only *three* regular solids in *n*-dimensional Euclidean space when *n* is 5 or 6 or any higher integer. On the other hand, Albert Einstein pursued a view of physics in which our 3-dimensional space needs to be conceived of as part of a 4-dimensional geometry of “spacetime” in which the properties of space and time become interwoven. (See Endnote #3.)

Despite the fact that the two thinkers’ lives overlapped — indeed, Einstein’s precocious ruminations about riding a beam of light ocurred around the time of Schläfli’s death — in an important sense their work did not overlap at all. Partly that’s because the two great relativity theories that Einstein developed aren’t Euclidean; special relativity uses what we now call Minkowski space (with time playing a privileged role that distinguishes it from the other three dimensions; see Endnote #4), and general relativity makes the game even deeper by allowing Minkowski space to warp and bend. But more importantly, Einstein was concerned with *our* world while Schläfli was concerned with idealized *mathematical *worlds.

Nowadays there are speculations that our physical universe might have extra dimensions that are too small to see. The possibility of there being extra dimensions is a tantalizing one (it’s the “Maybe” I mentioned earlier), but in math, extra dimensions are more than a possibility: they becomes an actuality, albeit just one of many coexisting actualities, because math (as we understand it nowadays) isn’t about *actuality*, but about *possibility*.

**GNOSTIC PHYSICS VERSUS GNOSTIC MATH**

To help me further clarify the divide between math and physics, I’ll recruit a couple of hypothetical demiurges (similar to the one postulated by Gnostics, but nerdier) to help me.

So, imagine if all at once all over the world a booming voice were heard, saying: “Hello, hello! Hello everyone. (Is this working? Oh good.) I am a mighty Demiurge, and I have decided to confess: I’ve been messing with you. Most of you are aware that some religious fundamentalists on your planet believe that I or someone like Me planted dinosaur bones in the ground as a test of your faith. Well, I *have* been messing with you. But not using dinosaur bones. Instead, I’ve been interfering with the behavior of electricity and magnetism and light on your planet for several centuries. The bottom line is, Newtonian physics is correct, and special relativity is wrong. There actually *is* a preferred reference frame for observers; the luminiferous ether is real; Maxwell’s equations are wrong; et cetera, et cetera. Surprise!”

Such a pronouncement would give us reason to reconsider Einstein’s theories, but not Schläfli’s. The existence of exactly six regular polytopes in four dimensions is a fact of pure reason, not an experimental observation. *If* we lived in a four-dimensional Euclidean space, *then* there would be exactly six different ways to stick regular three-dimensional polyhedra together to form regular four-dimensional polytopes. The Demiurge’s proclamation wouldn’t change that.

In contrast, we might imagine a different Demiurge who pipes up “That’s nothing! *I* messed with Ludwig Schläfli’s head, and the heads of *everyone* who ever read his work or reconstructed it for themselves, so that nobody would notice the logical fallacy in his proof and discover the hidden seventh regular polytope; every time one of you humans reads the argument for why it can’t exist, I make your brains go BLOOP at the crucial moment and you miss the mistake!” That’s an entirely different kind of mischief. It’s one thing for us to suspect that our observations have misled us; it’s a more disturbing thing to suspect that our processes of reasoning are themselves flawed, and this suspicion quickly leads us to far more radical doubts that undermine not just the Schläfli’s work or Einstein’s but the entire scientific enterprise and our whole sense of self.

(For instance, I don’t *think* that I’m just a brain in a vat. But wait a second: what right have I to say what I think or don’t think, if I can be mistaken about what my own thoughts are? But wait another second: the words “what right have I to say that” only makes sense if I can say things, and if I’m a brain in a vat, I only *think* I’m saying things. Then again, what does “wait a second” even mean if Time itself is an illusion? And …)

To stress the difference between physics and mathematics, I’ll borrow a phrase introduced by the paleontologist Stephen Jay Gould to try to broker an amicable divorce between science and religion. Gould called the disciplines “non-overlapping magisteria” and contended that they weren’t in conflict because science’s questions are “what/when/where?” questions while religion’s questions are “why?” questions, and that there can be no contradiction between the *is* and the *ought*. There are some problems with Gould’s attempt to resolve a key tension of the modern age, not least of which is that fundamentalists of various faiths maintain that their religious scriptures give clear statements of What Is from the Creator of the the Universe (who presumably would be in a position know). But my point is that, in a similar way, math and physics fail to collide because they fail to connect. Math is an engine for deriving non-obvious consequences of assumptions, but it cannot tell us which assumptions to make. We can compare the predictions of mathematics with observations of the real world and use the resulting concordance or discrepancy to decide whether the assumptions that led to those predictions are useful in explaining the world, but when we do this we are doing physics, not math.

**THE THREE FACES OF PI**

Then again, maybe we should think of math and physics as *overlapping* magisteria, and picture things like the famous quantity pi (3.14159…) as living in the overlap. On the one hand, pi is a physical quantity that measures the ratio between the circumferences and diameters of actual, physical circles; on the other hand, it’s a mathematical quantity that is for instance equal to 4 times the limit of the infinite sum 1 – 1/3 + 1/5 – 1/7 + … Let’s call the former *physical pi* and the latter *formulaic pi*. In fact, I want a third pi that I’ll call *geometric pi*. Geometric pi is the ratio between the circumference and the diameter of an ideal mathematical circle, whether or not such circles exist in our world. Mathematical reasoning can lead us, by a beautiful but complicated path, from geometric pi (“circumference divided by diameter”) to formulaic pi (4 times 1 – 1/3 + 1/5 – 1/7 + …, or some other formula for pi you prefer) but it doesn’t tell us whether Euclid’s axioms are a true description of our world. If they’re not, then geometric pi and formulaic pi, although exactly equal to each other, don’t pertain to the world we live in except perhaps as approximations.

Let’s take this idea further. Physical pi involves measuring things, and it can only be known up to finite accuracy. If we can only build circles up to 10^{20} meters across, and we can only measure them to within an accuracy of 10^{–20} meters, then we can only know the diameter or circumference of a circle with 40 significant figures, and when we take the ratio of two such measurements (the circumference and the diameter), we again get only 40 significant figures. In this setting, does it make sense to talk about the hundredth digit of that ratio as having a definite value if there is no way to measure it? Indeed, quantum physics tells us that the whole game of measuring lengths becomes problematic at the subatomic scale. Likewise, general relativity says that once you start building things (like a super-big blackboard on which to draw a super-big circle), the things you build will warp space, causing deviations from Euclid’s axioms (which only apply to flat space, not curved space). So when we talk about computing hundreds of digits of pi, we don’t — *can’t* — mean pi the physical constant; we must mean pi the mathematical quantity, defined by expressions like 4 times (1/1 – 1/3 + 1/5 – 1/7 + …).

A Demiurge might be able to warp space to change our measurements, but it’d have to warp our brains to make us think that 4(1/1 – 1/3 + 1/5 – 1/7 + …) equalled 5, say.

By the way, I don’t want to leave you with the mistaken impression that formulaic pi denotes the value given by the specific formula 4(1/1 – 1/3 + 1/5 – 1/7 + …). There are thousands of known formulas for pi, and it’s the totality of them that constitute what I’m calling formulaic pi, and not any one of them in particular. We know that they’re equal not by measuring objects but by reasoning about mathematical expressions, in the place where Math lives.

**WHERE MATH WENT**

It’s hard to say where Math went when she moved out of our universe. Plato pointed the way to her new home when he wrote “You know that the geometers make use of visible figures and argue about them, but in doing so they are not thinking of these figures but of the things which they represent; thus it is the absolute square and the absolute diameter which is the object of their argument, not the diameter which they draw.” That is, human geometers may draw pictures, but when we draw those pictures we’re thinking about something Absolute, even if it’s in a realm we can’t get to.

In her new home, Math doesn’t have to equivocate and add “… (assuming that Euclid’s axioms are correct)” at the end of every statement of a theorem of Euclidean geometry; she can just make assertions about ideal squares, diameters, circles, etc. that perfectly satisfy Euclid’s axioms, period. Build a Euclidean square whose base is the diagonal of some given Euclidean square, and the new square has area exactly twice the area of the old square. In the place to which Math has gone, there’s no need to worry about black holes warping the picture, or quantum foam undermining the diagram at sub-nanoscale. The constructs that pervade Math’s new home are precisely what they were constructed to be. In some ways it’s a lonely place, but it’s where you need to go if you want to connect with perfect truth, and to know the things you can be absolutely, positively sure of.

Since we humans can’t get to where Math went, we argue about whether the place even exists, and Math is cool with that. And she’s not even lonely, because guess who’s been visiting her there, and sometimes even spending the night?* Physics!* Phyz wants to talk about quantum field theory in *n* dimensions and how it relates to general relativity in *n*+1 dimensions, for all values of *n*!

In the best comedies of remarriage, the two parties to the marriage have done some growing during their period of separation. Perhaps each of them has developed characteristics of the other, becoming better-rounded people in the process. Or perhaps they have become more tolerant of themselves and others. Either way, the new relationship they develop is not the same as the one they had before.

Going back to our celestial Couple, Physics came to accept that Math’s flirtatiousness, her inability to be satisfied with just one universe (or even some large but finite number of them!), wasn’t just a sign of immaturity; her flirtatiousness was a key component of her nature. Phyz realized that even if *she* (Phyz) was content with 4 dimensions, or 11, or 26, Math could never stop there, nor would Phyz really want her to. Higher dimensions and curvature and bizarre topologies and even weirder variations on the theme of what space could be — Phyz now understood that all of this was part of what makes Math wonderful.

But Mathematics learned something too. All along, she’d thought of herself as the imaginative free-spirited one, and physics as the uncreative plodder. But then came string theory, and a particular prediction of string theory called the gauge-gravity correspondence. It was inspired by the physical world, and it might in the end make predictions about the real world, but beyond that possible application, it gave rise to beautiful new theorems. Who could have imagined physics providing inspiration to algebraic geometry? Algebraic geometry was one of the purest precincts of math. Surely if there was to be any traffic between the disciplines, math would inspire physics, and not the other way round! Yet in recent decades, ideas about fundamental particles that may or may not turn out to be good descriptions of the world we live in have provided inspiration to pure mathematicians, providing blueprints for some of the loftiest airborne castles mathematicians are trying to build.

The parade of ideas being imported from physics into mathematics doesn’t undermine my claim about math and physics being separate magisteria, but it sure does complicate it!

Some may rightly point that traffic between math and physics has been bidirectional for a while. They’ll point to Richard Feynman’s non-rigorous path-integrals, or to Oliver Heaviside’s even earlier non-rigorous delta function, which didn’t fit into mathematics when they were first formulated, and whose successes forced an enlargement of mathematics. But string theory is the best example to date. It’s not clear whether, without physics to inspire them, mathematicians would have made the leaps of imagination that led to mathematical string theory — even though the standard stereotype is that mathematicians are the unfettered makers of creative leaps while physicists are constrained by the need to describe the physical world.

Anyway, getting back to our two Personifications, and to my imaginary movie about their divorce and remarriage: In the last scene of the film, Physics and Mathematics return to the place where they first took their vows, and we see Phyz giving a new present to Math: an arXiv preprint discussing new connections between mirror symmetry and the geometric Langlands program. A look of shock comes to Math’s face, replaced by a slowly dawning delight. We the moviegoers don’t know what kind of new relationship the two of them will have going forward, and we’re not sure they know either. But we can tell from the look on Math’s face that what has just been bestowed on her was absolutely, positively the perfect gift.

*Thanks to Sandi Gubin.*

**ENDNOTES**

#1. Here’s what Plato said (in the dialogue *Timaeus*) about the correspondence between four of the five regular solids (the cube, octahedron, tetrahedron, and icosahedron) and the four elements that comprised the physical world according to Greek thought (earth, air, fire, and water):

*“To earth, then, let us assign the cubical form; for earth is the most immoveable of the four and the most plastic of all bodies, and that which has the most stable bases must of necessity be of such a nature. Now, of the triangles which we assumed at first, that which has two equal sides is by nature more firmly based than that which has unequal sides; and of the compound figures which are formed out of either, the plane equilateral quadrangle has necessarily, a more stable basis than the equilateral triangle, both in the whole and in the parts. Wherefore, in assigning this figure to earth, we adhere to probability; and to water we assign that one of the remaining forms which is the least moveable; and the most moveable of them to fire; and to air that which is intermediate. Also we assign the smallest body to fire, and the greatest to water, and the intermediate in size to air; and, again, the acutest body to fire, and the next in acuteness to, air, and the third to water. Of all these elements, that which has the fewest bases must necessarily be the most moveable, for it must be the acutest and most penetrating in every way, and also the lightest as being composed of the smallest number of similar particles: and the second body has similar properties in a second degree, and the third body in the third degree. Let it be agreed, then, both according to strict reason and according to probability, that the pyramid is the solid which is the original element and seed of fire; and let us assign the element which was next in the order of generation to air, and the third to water. We must imagine all these to be so small that no single particle of any of the four kinds is seen by us on account of their smallness: but when many of them are collected together their aggregates are seen.”*

As for the fifth regular solid, the dodecahedron, Plato decided that it must be correspond to some fifth element (or “quintessence”), and that since the number of its sides (twelve) is the number of signs in the Greek zodiac, it must be the element that the heavens are made of.

It should be stressed that Plato advanced this cosmology as a working hypothesis, not as what we would nowadays called “settled science”.

#2: As a side-note to my parable, I can’t resist mentioning that, to the Pythagoreans, the number five symbolized marriage, as it was the sum of the first “male” number (3) and the first “female” number (2). Presumably the Pythagoreans would have thought it more fitting to use the number 4 to symbolize the marriage of two females.

#3: When long skinny object rotates, we may sometimes see it as being tall and thin (when its axis is vertical) and at other times as being low and long (when its axis is horizontal), but we don’t think anything essential about it has changed. This is all the more true if the object is stationary and we, the observers, are the ones doing the rotating. That’s because the three dimensions of space are interwoven. In Einstein’s theory of special relativity, time joins the weave but in a different way. A clock with a circular clock-face moving at close to the speed of light will appear to run slow and its face will not look circular. The same is true if the clock is standing still and we’re the ones who are moving. But the tempo and shape of the clock haven’t changed — just the relationship between it and the observer.

#4: One way to build up the theory of three-dimensional Euclidean geometry is to use coordinates in the manner pioneered by Descartes. Points become triples of numbers, and the distance between the point (*x*_{1},*y*_{1},*z*_{1}) and the point (*x*_{2},*y*_{2},*z*_{2}) is the square root of (*x*_{1}–*x*_{2})^{2}+(*y*_{1}–*y*_{2})^{2}+(*z*_{1}–*z*_{2})^{2}. We could build a 4-dimensional Euclidean space by using quadruples (*w*,*x*,*y*,*z*) instead of triples (*x*,*y*,*z*) and define the distance between the point (*w*_{1},*x*_{1},*y*_{1},*z*_{1}) and the point (*w*_{2},*x*_{2},*y*_{2},*z*_{2}) to be the square root of (*w*_{1}–*w*_{2})^{2}+(*x*_{1}–*x*_{2})^{2}+(*y*_{1}–*y*_{2})^{2}+(*z*_{1}–*z*_{2})^{2}. But for purposes of physics it’s better to use Minkowski space: points are still quadruples, but now our fourth coordinate is to be thought of as signifying time, and the “distance” between (*x*_{1},*y*_{1},*z*_{1},*t*_{1}) and (*x*_{2},*y*_{2},*z*_{2},*t*_{2}) is (*x*_{1}–*x*_{2})^{2}+(*y*_{1}–*y*_{2})^{2}+(*z*_{1}–*z*_{2})^{2}–(*t*_{1}–*t*_{2})^{2}. The minus sign in front of that last (*t*_{1}–*t*_{2})^{2} is crucial. Distances can now be negative numbers, corresponding to events in space-time that occur in a definite order no matter who observes them; meanwhile events at positive distance correspond to events that are causally separated, and events at distance zero correspond to points in spacetime along the path of a photon.

(Some people prefer to use (*t*_{1}–*t*_{2})^{2}–(*x*_{1}–*x*_{2})^{2}–(*y*_{1}–*y*_{2})^{2}–(*z*_{1}–*z*_{2})^{2}. That also works, as long as you don’t get mixed up about which sign-convention you’re using.)

“That’s a teenager’s idea of what being a genius is like,” I would tell people.

“Oh, and are you a genius?” one woman once asked me skeptically.

“No,” I answered, “but I know a few.” Which was true: I’d been an undergraduate at Harvard, a visiting student at Cambridge University (where I’d worked closely with John Conway), and a graduate student at U. C. Berkeley before landing a tenure-track position at M.I.T. So I’d gotten a chance to interact with world-class mathematicians at close range, and Will Hunting resembled none of them.

The fact is, school math may have been easy for most of the people who go on to become research mathematicians, but if you really want to make a name for yourself as a researcher, you’re in competition with a lot of other people who were good at school math, *and* college math, *and* grad school math, and who also want to make names for themselves; the way to rise in the profession — or, putting things less competitively, the way to do the best work of which you’re capable — is to seek out things that are *hard* for you, but doable, and then do them. That’s what Andrew Wiles did when he set out to prove Fermat’s Last Theorem; that’s what Yitang Zhang did when he worked on a weakened (but still epochal) analogue of the twin primes conjecture; and a lot of my own research over the course of my career has required hard work. I wouldn’t be able to feel proud of an article I’d written if some effort hadn’t been involved. Plus, in some ways a proof is a story that I tell myself, and part of the fun is encountering surprises along the way; if I know in advance how everything is going to go, it’s less fun to write. I think most mathematicians resemble me in this respect.

But … what if someone was so good at mathematics (or, more realistically, a branch of mathematics or maybe two) that they just saw things that other people didn’t see, and produced breakthrough after breakthrough? Might such a person come to devalue the enterprise of creating new mathematics? Just because I hadn’t met anyone like that didn’t mean that such a person couldn’t exist.

In fact, there was such a twentieth-century mathematician, and his name was Alexandre Grothendieck (he preferred to spell his first name as “Alexander”, and close friends called him “Shurik”). By the time I came of age mathematically, this singular individual who in his younger days had done so much to revolutionize mathematics had turned his back on the field and was even urging others to do the same. You can read about Grothendieck in Rivka Galchen’s excellent article in the May 16, 2022 issue of the New Yorker. For a more mathematical treatment, see the article “Comme Appelé du Néant—As If Summoned from the Void: The Life of Alexandre Grothendieck” by Allyn Jackson, published in the Notices of the American Mathematical Society in 2004 in two installments (part 1 and part 2); Winfried Scharlau’s “Who Is Alexander Grothendieck?”, published in the Notices of the American Mathematical Society in 2008; and the obituary “Alexandre Grothendieck 1928–2014” by Michael Artin, Allyn Jackson, David Mumford, and John Tate, published in the Notices of the American Mathematical Society in 2014 in two installments (part 1 and part 2).

**MEASURES OF GREATNESS**

Grothendieck had two careers as a mathematician: the first was brilliant, and the second was spectacular. Galchen mentions the first one briefly. She writes: “While at the [University of Montpellier] — which was not an important center of mathematics — Grothendieck independently pursued research on ideas having to do with measures, a field that less gifted students might dismiss as obvious.” Here “measure” is a generic term that refers to things like length, volume, surface area, and analogous notions in higher dimensions. These concepts, at least in three or fewer dimensions, are so intuitive that you might think that there’s nothing much to be said that wasn’t known by the ancient Greeks. But there are paradoxes here to trip the unwary, even in just one dimension.

As a graduate student, the comparatively untrained Grothendieck ended up reproducing a famous result of the mathematician Henri Lebesgue in the theory of measures. Not discouraged when he learned he’d been scooped, Grothendieck went on to study the field of topological vector spaces, and tackled fourteen problems that his mentor, Laurent Schwartz (a future Fields Medalist) had proposed; in his doctoral thesis Grothendieck solved all of them, prompting Schwartz to say it might be time for him to stop teaching Grothendieck and starting learning from him instead.

If Grothendieck had wanted, he could have continued in this line of work and been among the top dozen mathematicians of his generation, an expert mathematician respected around the world. Instead, at age 27 he switched to algebraic geometry and became, in a manner of speaking, a seer, though he would have described himself as a builder. Mathematician David Ruelle has said that in this phase of his life, Grothendieck (metaphorically speaking) built the entire ground floor of a vast cathedral. But Grothendieck did not do it alone.

**THE ODD COUPLE**

One member of Grothendieck’s circle in the 1950s and 1960s was Pierre Cartier (who incidentally came up with the application of Galois fields to coding theory that I described in one of my earlier essays). In 2010 Cartier wrote an article entitled “A country of which nothing is known but the name: Grothendieck and “motives””. A few weeks after Grothendieck’s death in 2014, Cartier gave an interview to mathematician Sylvie Paycha. In both the article and the video, Cartier describes the role played by some of Grothendieck’s colleagues, especially Jean Dieudonné.

The renowned Institut des Hautes Études Scientifiques, France’s equivalent of the Institute for Advanced Study, didn’t start out big; originally it had only two salaried employees, Grothendieck and Dieudonné. Dieudonné had made the hiring of Grothendieck a precondition of his coming to the newly-founded I.H.E.S. The two were an odd couple; the former was an establishment-type, politically conservative and happy to work within the system as it was, while the latter was a political radical, prone to gestures like fasting one day a week to protest the ongoing war in Vietnam. Yet Dieudonné, for all that he was a world-class mathematician, saw that young Grothendieck, twenty years his junior and in many ways his opposite, was truly exceptional, and set about to become a conduit for the younger mathematician’s insights.

Both were part of a semi-secret mathematical collective called Bourbaki. (They didn’t keep their existence a secret at all; they published dozens of books. They did try to conceal who the members of the collective were, but they didn’t try very hard.) Here’s how the collaboration between Grothendieck and Dieudonné worked: Grothendieck would write his ideas in the form of sketches, feverishly writing all night, and presenting his notes to the older mathematician at five in the morning. At that point, Dieudonné would spend several hours fleshing out Grothendieck’s notes and writing them up in a form that was suitable for sharing with the other members of Bourbaki.

Cartier describes an incident in which a frustrated Dieudonné, in a move more reminiscent of young Will Hunting than his mentor, dramatically tore up what he’d presented to the group, at which point two of the other mathematicians dove to the floor to rescue it (a step that proved to be unnecessary as Dieudonné had prudently made a second copy).

Earlier in this essay I asked “What if someone was so good at mathematics that they just saw things that other people didn’t see?” But to say that Grothendieck was good at math misses an important point. Grothendieck himself, though not prone to false modesty, would be the first to say that his colleagues Jean-Pierre Serre and Pierre Deligne had more raw talent than he did. Two things made him more productive than the others during this period. First, he worked incredibly hard (I mentioned his serial all-nighters); never would Grothendieck have claimed, Will-Hunting-ishly, that the work was easy. Second, Grothendieck was *different*. Some of that difference came from having been self-taught, so that he had to find his own idiosyncratic ways of doing things. (Most of the time, being self-taught is a disadvantage; not so in Grothendieck’s case.) But some of that difference may have been inborn. Cartier describes Grothendieck as a kind of eagle, soaring up to great heights to get a panoramic view of its prey before attacking in one blinding strike. Mathematician Michel Demazure had a different way of describing the Grothendieck difference: “He did math upside down.” Most of us start with examples, and build up the intuition that lets us grasp generalities. Grothendieck started with the generalities and somehow managed not to lose his footing among the clouds.

Grothendieck had other, gentler ways of describing his style of doing math. One metaphor he liked was soaking a nut in water and letting it slowly soften rather than trying to crack it open. Another was the metaphor of a slowly rising sea: nothing dramatic happens, but the land (that is, the problem) that seemed so imposing before has somehow disappeared. He also claimed that at least one of his ideas (the notion of a “scheme”) came not from soaring, as in Cartier’s image, but from stooping lower than anyone else had dreamed of stooping. For more on Grothendieck’s views of his own style, see Colin McLarty’s “The Rising Sea: Grothendieck on Simplicity and Genius”; for the story of schemes, see McLarty’s “How Grothendieck Simplified Algebraic Geometry” (from the March 2016 issue of the Notices of the American Mathematical Society).

**BEING GOOD**

I’ve never much liked the title of* Good Will Hunting.* Sure, Will is good at math, but as I recall, he only progresses from being a total jerk to being a partial jerk; he never becomes an actively good person, unselfishly striving to make the world a better place.

Grothendieck, in contrast, was very much concerned with being good, but he tended to focus on the good of the world as a whole, not on the good of people around him. In this he reminds me of the old joke about the married couple in which the wife takes care of all the unimportant decisions (where to live, what car to buy, how to raise the kids, etc.) and the husband takes care of all the important decisions (what the president should do about the economy, what the president should do about the environment, etc.). Grothendieck married several times and had several children, but he was a neglectful husband and father, and he was sometimes cruel in a high-minded way. This dual tendency is highlighted in a story that mathematician Barry Mazur tells about a time when he and his wife Gretchen had dinner with Grothendieck and his wife-at-the-time Mireille. Mazur went to extraordinary pains to prepare a vegetarian meal for the Grothendiecks since Alexandre was a principled vegetarian. After profusely thanking Barry and Gretchen for the wonderful spread, Alexandre turned to Mireille and harshly told her “See how easy it is to make a vegetarian meal!” This streak of intolerance would eventually grow to the point where it alienated every single one of Alexandre’s friends.

Cartier describes one of the more momentous ruptures in Grothendieck’s career, when he and Dieudonné parted ways as a result of a dispute about where and how Grothendieck could distribute political pamphlets during a mathematical conference. He abruptly resigned from the I.H.E.S. in 1970, and incredibly, soon thereafter he stopped publishing mathematics. As Ruelle says: after building the ground floor of his cathedral with his bare hands, Grothendieck turned his back on it and walked away. He began to claim that mathematicians and physicists were the most dangerous people in the world because they gave politicians access to tools that could destroy the human race. He became a hermit in the Pyrenees, with no phone and no postal address.

**THE ABSOLUTE GALOIS GROUP**

Galchen’s article discusses the mathematicians Leila Schneps and Pierre Lochak, who in the 1990s decided to track down the hermit. Schneps had become captivated by his work, especially his strange memoir *Récoltes et Semailles* (“Reapings and Sowings”), and she was eager to meet the mind that had created it.

One mathematical creation of Grothendieck’s that had enchanted Schneps was a subject to which Grothendieck had given the curious name “*dessins d’enfants*“, or “children’s drawings”, in a 1984 research proposal that he never published but which was informally circulated for over a decade before Schneps and Lochak published it in 1997. Following the proposed research program, one would be able to catch glimpses of the absolute Galois group, one of the most complicated mathematical objects ever studied, through what might seem the narrowest of windows: simple drawings of lines and curves on a piece of paper joining black and white dots.

I’ll try give you a bit of the flavor of the absolute Galois group. Mathematicians distinguish between, on the one hand, numbers like the square root of two and the imaginary number *i* which are defined by algebraic equations (*x*^{2} = 2 and *x*^{2} = –1, respectively), and, on the other hand, numbers like *e* and *π* which can’t be defined in this way. Numbers of the former type are called algebraic; numbers of the latter type are called transcendental. The set of all algebraic numbers is often represented by the symbol ℚ, not to be confused with the symbol ℚ, which represents the much smaller set of all rational numbers.

Gal(ℚ/ℚ) (“gal-q-bar-over-q”), the absolute Galois group, is an infinite collection of mathematical operations that you can perform on ℚ, of which only two can be concretely described. One of them is the operation called conjugation, that (for instance) turns 3+4*i* into 3–4*i* and vice versa. If you perform conjugation twice, you get back what you started with; this do-nothing operation is the other graspable elements of Gal(ℚ/ℚ). The rest exist, but in a much more tenuous way; you can’t really get your hands on any of them, because to specify any single one of them requires that you make an infinite sequence of choices, some of which are arbitrary (Buridan’s ass comes to mind) and others of which are forced by earlier choices, in a garden of forking paths from which there is no exit). Moreover, although the set ℚ can be squashed into the plane, the squashing does violence to its essential nature by marring some of its symmetry; the place ℚ really “wants” to live is an infinite-dimensional space, where the absolute Galois group can be seen as the group of symmetries of an impossibly intricate sort of crystal.

The absolute Galois group casts various shadows that are easier to understand: the (relative) Galois groups Gal(*K*/ℚ), where *K* is a number field intermediate between ℚ and ℚ. (I wrote a bit about towers of number fields in an earlier essay.) We can try understand the absolute Galois group by way of its shadows, but the absolute Galois group itself still remains beyond our grasp.

I hope I’ve convinced you that Gal(ℚ/ℚ) is an extremely abstract thing, so that you can appreciate how amazing it is that Grothendieck gave the world a new way to think about Gal(ℚ/ℚ) using the most homely of tools: childish-looking drawings. One might go so far as to compare *dessins* with Will Hunting himself: an unlikely source of deep mathematical insight. Moreover, some of the *dessins *bear a striking (albeit coincidental) resemblance to some diagrams Will drew on a blackboard in the hallway.

Schneps was especially charmed by the way Grothendieck drew actual pictures to explain his ideas. The sage usually stayed up in the clouds of abstraction, letting others figure out specific cases. Yet what it came to *dessins d’enfants*, he wasn’t content merely to lay the foundations of a general theory; he also worked out specific examples.

In their search for Grothendieck, Schneps and Lochak met with success, of a kind; they did eventually meet their hero. But he wasn’t always friendly, and he never, ever discussed math with them. That part of his life was over.

(For another account of a Grothendieck fan tracking the man down, see Roy Lisker’s diary of his quest for Grothendieck: “Visiting Alexandre Grothendieck”.)

After Grothendieck’s death, Schneps and others were determined to preserve the thousands of pages of writing that Grothendieck had created during his decades living off-the-grid. Many of these writings are difficult or impossible to classify using standard categories of scientific writing, aubiography, or literature. For details of the work of the people dedicated to preserving Grothendieck’s mathematical and post-mathematical legacy, see the Grothendieck Circle webpage.

So, if you ask me nowadays about the movie *Good Will Hunting*, I won’t say that there couldn’t be such a character. Because in a way, there once was. (But his mentor? No way! Nobody gets an office that big.)

*Thanks to Leila Schneps.*

Langford noticed that between the two red blocks was *one* block, between the two blue blocks were *two* blocks, and between the two yellow blocks were *three* blocks. Being a mathematician, Langford immediately wondered “Could we do this with more than three colors?”

Can you figure out how to add two green blocks and arrange the eight blocks so that there will be *one* block between the red blocks, *two* blocks between the blue blocks, *three* blocks between the yellow blocks, and *four* blocks between the green blocks?

And, having succeeded with four colors, can you do it with five?

Langford’s problem is, for which positive integers *n* is it possible to take 2*n* colored blocks, with two blocks of each color, and stack them so that for all *k* between 1 and *n* there are exactly *k* blocks between the two blocks of color #*k*?

Or, if you prefer, for which positive integers *n* is it possible to write down a sequence of numbers consisting of two 1’s, two 2’s, …, and two *n*‘s, so that for all *k* between 1 and *n* there are exactly *k* numbers between the two *k*‘s? Such a sequence is called a *Langford sequence*.

I already showed you how to do build the stack of blocks for *n*=3 (corresponding to the Langford sequence 312132), and if you look at Endnote #2 you’ll see one way to do it for *n*=4. But no matter how you try, you won’t be able to do it for *n*=5. That is, there are Langford sequences of order 3 and 4 but there isn’t one of order 5.

**WHY NOT?**

Let’s assume for argument’s sake that there were a Langford sequence of order 5. Let *a* and *b* be the positions of the 1s, *c* and *d* the positions of the 2s, *e* and *f* the positions of the 3s, *g* and *h* the positions of the 4s, and *i* and *j* the positions of the 5s; for instance, if our Langford sequence of order 5 began 1 2 1 3 2 … we’d have *a*=1 and *b*=3 (since the 1s are in the 1st and 3rd positions), *c*=2 and *d*=5 (since the 2s are in the 2nd and 5th positions), etc.

We don’t know the values of *a* and *b*, but we know that they must differ by 2 (because the two 1s have a single number between them), so *a* and *b* are either both even or both odd; either way, the sum *a*+*b* is *even*.

In a similar way, we know that *c* and *d* differ by three, which means that one is odd and one is even, so that the sum *c*+*d* is *odd*. And for the same reason we know that *e*+*f* is *even* and *g*+*h* is *odd* and *i*+*j* is *even*.

Now this tells us that (*a*+*b*)+(*c*+*d*)+(*e*+*f*)+(*g*+*h*)+(*i*+*j*) is of the form *even* plus *odd* plus *even* plus *odd* plus *even,* which must be an even number (see Endnote #3). On the other hand, the ten numbers *a*, *b*, *c*, *d*, *e*, *f*, *g*, *h*, *i*, *j* are just the ten numbers 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 in some scrambled order, and you can readily check that 1+2+3+4+5+6+7+8+9+10 is 55. So if you believe that a Langford sequence of order 5 exists, you must believe that 55 is an even number. But you don’t really believe 55 is even, do you? So you must admit that there isn’t a Langford sequence of order 5 after all.

**GOTCHAS**

Now, if you haven’t seen proof by contradiction before, this argument might seem suspiciously close to circular reasoning, where I “prove” a conclusion by sneakily assuming it at the beginning. But the circle here is actually more akin to the circle formed a snake devouring its own tail, for here what we assume is the *negation* of what we’re trying to prove, and we show that the assumption contains the seeds of its own destruction.

A different sort of objection is that the proof, although logically impeccable, is socially obnoxious. Imagine, for comparison, that you and I are discussing some moral issue that we disagree about, and I say “Let’s assume for argument’s sake that you’re right,” only to force-march you to some horrible consequence you don’t subscribe to (“… so, clearly you must believe killing babies is a GOOD thing!!”). My earlier show of open-mindedness is revealed to have been a sham all along, just a sneaky, troll-ish way of setting you up for a “Gotcha!” moment.

What renders proof by contradiction inoffensive in comparison with the aforementioned trolling behavior is that in math we are *trolling ourselves*. And, far from being shaming, the “Gotcha!” is often exonerating. The reason I can’t construct a Langford sequence of order 5 isn’t that I’m not clever enough — it’s that it can’t be done.

A third objection is “Wait, that’s not math; that’s just a *trick*!” I can imagine this objection being voiced by someone whose only experience of math is the dreariest kind of school math, such as endless drills of long division. To such an unfortunate student, math is all about *method* (“When you encounter thus-and-such a problem, here is what you do”), and it strikes the student as deeply unfair to present them with a problem without first teaching them how to solve problems of that type. Furthermore, the gimmicky numbering seems to descend from the sky like a *deus ex machina*, and after it’s done its work it disappears again to wherever it came from.

There’s a mathematical aphorism that a method is just a trick that works more than one time (though some versions of the motto use a higher cutoff than one). In the rest of this essay, I’ll show two more examples of the trick you just saw, and hopefully I’ll convince you that it contains the seeds of a method.

**DE BRUIJN’S PUZZLE**

Can you arrange nine 1-by-4 rectangles (which you are also allowed to rotate to become 4-by-1 rectangles) to tile a 6-by-6 square? Here, I’ll get you started by placing two of the nine rectangles:

Certainly it seems plausible that the nine rectangles can be arranged to cover the square, since the total area of nine 1-by-4 rectangles is 36, which is also the area of a 6-by-6 square. But no matter how you try to arrange the nine rectangles, there’ll always be at least one rectangle that sticks out beyond the confines of the big square, as well as some parts of the square that aren’t covered by any rectangle. Try it! But don’t spend too long trying, and don’t be hard on yourself for failing. It’s not that you’re not clever enough; it simply can’t be done.

But how do we prove that your failure wasn’t due to lack of cleverness? It takes a bit of cleverness to prove that! But if you’ve seen how we dealt with Langford sequences of order 5, the cleverness won’t come totally out of the blue. Let’s put numbers in the 1-by-1 squares that make up our 6-by-6 rectangle in diagonal stripes, like so:

I claim that if you place a 1-by-4 rectangle (pointing either horizontally or vertically) so that it covers four 1-by-1 squares, then the numbers in those squares add up to an even number that is not divisible by 4. (Such numbers are, or used to be, called “oddly even”: they’re even numbers, but when you divide them by two you get an odd number. They are precisely the numbers that leave a remainder of 2 when you divide them by 4.)

Check it out: if you put a 1-by-4 rectangle anywhere on the square, add the numbers that it covers, and divide by 4, the remainder will be 2. (In the picture, we have 4+5+6+7=22 and 8+9+10+11=38, both of which are oddly even.)

You don’t have to exhausitvely try all the possible positions of the rectangle to see why my claim is true. Notice that when you slide a rectangle one square to the right or one square downward, all four numbers covered by the rectangle increase by 1, so that the sum increases by 4, which means that if the old sum was oddly even, the new sum must be oddly even too. So, in the case of a horizontal rectangle, it’s enough to verify that 1+2+3+4 (the sum of the numbers covered by a 1-by-4 rectangle in the upper left corner of the square) is oddly even. The same argument works for a vertical rectangle. So I’ve shown that no matter where you put the small rectangle, the sum of the four numbers it covers will be oddly even.

How does this help us? Well, let’s assume for argument’s sake that we can tile the big square with nine small rectangles. Each of the small rectangles covers numbers whose sum is oddly even, so if we take the grand total of all the covered numbers, we get a sum of nine oddly even numbers. And that sum must again be an oddly even number (see Endnote #4).

You’re probably guessing the form that the “Gotcha!” will take: I’m about to claim that if we add up the 36 numbers in the square, we get a total that isn’t oddly even. You’re right about that. But being lazy, I don’t want to add up those 36 numbers. Yet I still want a way to convince you that their sum isn’t oddly even.

In fact, I’m going to convince you that the sum is “evenly even”, i.e., a multiple of four). And I’ll prove this using a *different* way to tile a 6×6 square, namely, using nine 2×2 blocks.

Each 2×2 block contains (for some *i*) the numbers *i*–1, *i*, *i*, and *i*+1, whose sum is 4*i*, which is a multiple of 4. Since each block-sum is a multiple of 4 the sum of the nine block-sums will also be a multiple of 4.

So, having used the purported tiling of the 6×6 square by nine 1×4 rectangles to show that the sum of the thirty-six numbers leaves remainder 2 when you divide it by 4, and having used the tiling of the 6×6 square by nine 2×2 squares to show that the sum of the thirty-six numbers leaves remainder 0 when you divide it by 4, you’re faced with a choice: Do you want to believe in the first tiling or the second? Since the first tiling is something you haven’t been able to construct, and the second tiling is something I actually showed you, I’m pretty sure what choice you’ll make.

**ANOTHER TILING PUZZLE**

The following puzzle is one I devised; it’s a birthday present to my friend Michael Larsen (see Endnote #5).

Here’s a region called an Aztec diamond of order 3:

Michael and I (along with Noam Elkies and Greg Kuperberg) counted the ways to tile an Aztec diamond of order *n* with 1-by-2 rectangles, and if you want to know more about that, you can read my earlier essay “My life with Aztec diamonds” or watch a nice Mathologer video (see the References). But domino tilings are *so* twentieth century; let’s try something new.

Here’s a way to tile the Aztec diamond of order 3 using 1-by-4 rectangles and other tiles that look something like the letter S or the letter Z.

Solomon Golomb dubbed tiles made of four 1×1 cells *tetrominos*. I’ll call 1-by-4 rectangles (in either orientation) *straight* tetrominos, and S- and Z-shaped tetrominos (in any orientation) *skew* tetrominos.

We can tile the Aztec diamond of order 4 with straight tetrominos and skew tetrominos:

But what about the Aztec diamond of order 5?

It has area 60, and each tetromino has area 4, so there could conceivably be a way to use 15 straight or skew tetrominos to tile the Aztec diamond of order 5. Yet no matter how you try, you’ll fail.

Do you think you were insufficiently clever to find the tiling? Or do you think there’s a trick that shows that nobody, no matter how clever, could succeed where you failed?

My next picture probably won’t surprise you:

I claim that no matter how you place a straight tetromino on this board, you’ll cover four consecutive integers. Also, no matter how you place a skew tetromino on this board, you’ll either cover four consecutive integers or you’ll cover two consecutive integers with each occurring twice (such as 1+1+2+2 or 2+2+3+3). In either case, the sum will be an oddly even number.

What about the sum of all 60 numbers? You can’t dissect an Aztec diamond into 2×2 squares as we did in solving De Bruijn’s puzzle. Are we going to have to hunker down and add up those 60 numbers? Fortunately, there is once again a nice way to divide the grid-cells up to into foursomes, but this time the cells in a foursome won’t form a contiguous block. Cut the Aztec diamond into quadrants and imagine mirrors along the boundaries, so that each grid cell in the northeast quadrant has mirror images in the other three quadrants. (For instance, the boldface **5** is mirrored by the boldface **2**, **7**, and **10**.)

I claim that any grid cell in the northeast quadrant (call it the primary cell) and its three mirror images cover numbers that add up to a multiple of four. To see this, start the primary cell at the lower left corner of the northeast quadrant. Then the primary cell and its mirror images form a 2×2 block, and we already saw that the numbers in such a block add up to a multiple of four.

Now imagine taking that grid cell in the northeast quadrant and sliding it one step east, with its three mirror images sliding too.

Two of the cells slide east and two slide west, so two of the numbers increase by 1 while the other two decrease by 1, with a net change of zero. The same invariance is observed if you take primary cell and slide it one step north, or if you apply multiple sliding moves.

So, dividing the Aztec diamond into foursomes of cells related by mirror symmetry, we get foursomes whose associated numbers add up to a multiple of four. Adding all 15 such sums, we find the sum of all the numbers must also be a multiple of four.

But we saw earlier that if there is a way to tile the Aztec diamond of order 5 with straight and skew tetrominos, then the sum of the numbers is not a multiple of four. So there’s no way to tile the Aztec diamond of order 5 with straight and skew tetrominos.

**CAN WE GENERALIZE?**

I still haven’t told you the answer to Langford’s problem, which asks: for which positive integers *n* is it possible to take 2*n* colored blocks, with two blocks of each color, so that for all *k* between 1 and *n* there are exactly *k* blocks between the two blocks of color #*k*?

You can easily convince yourself that there’s no such structure when *n*=1 or *n*=2; we saw that such structures do exist when *n*=3 or *n*=4; and we reasoned that they don’t exist when *n*=5. Using the same reasoning (but with algebra instead of arithmetic) you can show that if *n* leaves remainder 1 or 2 when you divide it by 4, then there is no Langford sequence of order *n*. This leaves open the question of values of *n* that leave remainder 3 or remainder 0 when you divide *n* by 4. It was shown by Davies that in those cases, you can *always* construct a Langford sequence of order *n*.

A similar situation prevails for the Aztec diamond tiling problem. If *n* leaves remainder 1 or 2 when you divide it by four, then the argument I gave shows that the tiling is impossible; and it’s fairly easy to show that when the remainder is 3 or 0, then a tiling of the desired kind exists. Below I show a tiling for *n*=11 that fills the bill. (Actually, it shows only the top half of diamond; to get the bottom half, flip the picture.) By adding two extra rows, each consisting of six straight tetrominos, you get a tiling for *n*=12. It’s not hard to continue the pattern for *n*=15, 16, 19, 20, etc.

What about generalizing De Bruijn’s puzzle? Here the answer is that an *m*-by-*n* rectangle can be tiled with 1-by-4 (and 4-by-1) rectangles if and only if at least one of the two numbers *m*,*n* is a multiple of 4. This was first proved by De Bruijn himself. Golomb’s book “Polyominos” is one place you can turn to for Golomb’s proof. But with the trick I’ve taught you, you should be able to prove it for yourself.

Golomb turned polyominos into a board game (featured in the movie “2001: A Space Odyssey”), but the success of the game was eclipsed by the Tetris craze. I’ve sometimes wondered whether Golomb ever regretted failing to invent Tetris.

**THE LONG-DELAYED GATHERING**

I could have learned about Langford’s problem from Martin Gardner’s famous “Mathematical Games” column in Scientific American, but I didn’t (I was only seven at the time and not yet hooked on his column); instead, I learned about it last week at a gathering held in Atlanta, Georgia in honor of Gardner, although I wasn’t there. These gatherings, usually held every two years or so, have been going on since 1993, and the fourteenth Gathering for Gardner was scheduled to take place in the Spring of 2020 when the coronavirus pandemic led to its postponement (see Endnote #6). It was postponed several times and finally held in hybrid format earlier this month. Speaker John Miller gave a six-minute talk about the Langford problem. In the references I give a link to Miller’s essay on Langford’s problem (as well as a reference to Gardner’s original article mentioning the Langford problem).

There were dozens of stimulating talks at the fourteenth Gathering for Gardner, aka G4G14, and I could write an essay about many of them. I singled out Miller’s because it ties in with a general theme that interests me: the difference between tricks and methods.

Come to think of it, the idea of tricks in math ties in with one of Gardner’s other main interests: magic. One difference between a magic trick and a math trick is that when a magic trick is explained, some of the wonder goes away; there’s a feeling of disenchantment. Consequently, magicians are reluctant to explain their tricks. In contrast, we mathematicians are only too happy to explain our tricks, which provide both surprise and enlightenment. If you’re a fan of this kind of trick, Gardner’s books “Aha! Gotcha” and “Aha! Insight” are likely to please you.

The art of solving hard math problems is in part a matter of learning lots of tricks. A really good problem doesn’t come with a roadmap guiding you to a solution; you have to find your own path. But that doesn’t mean you have to be clever. It can be enough to have a good memory, so that you can say “Hey, this problem reminds me of problem X; I couldn’t solve problem X, but someone showed me a clever solution using method Y, so maybe method Y will work with this new problem!”

I don’t know an all-purpose method for solving math problems, but here’s the closest thing I know. To turn yourself into the kind of person who solves math problems, tackle as many problems as you can and learn all the different tricks that other people have come up with. Then, when you’re faced with a new problem, you’ll be able to think of three or four problems with a similar feel, and your experience with those problems will dictate three or four different approaches to solving the problem before you.

Once in a great while (once in a career, maybe) you may stumble upon a trick that nobody’s thought of before. And who knows? Every method was once a trick; maybe someday your trick will grow up to become a method.

**ENDNOTES**

#1: The Langford story belongs to a genre that I love: the math-parent story. I told one such story in the first year of my blog (“Lessons of a square-wheeled trike”) and more recently I’ve told two more in which I play the role of the parent (“The paintball party and the habit of symmetry” and “Here there be dragons”); if you know of others, from mathematical history or your own life, please post details in the Comments!

#2: One sequence that does the job is 41312432.

#3: If it helps, imagine we have five piles of cookies whose sizes are even, odd, even, odd, and even, respectively, and we want to share the cookies fairly between two children. We can divide the first, third, and fifth piles neatly in two, but when we try to do it for the second pile there’s one cookie left over, and likewise for the fourth pile. What to do? Take those two left-over cookies and give one to each of the children. There are no cookies left over, and each child has received equally many, so the total number of cookies was even.

#4: Again, think about cookies, but this time we’re dividing them among four kids. Each of the nine piles leaves a remainder of 2 when we divide it by 4. This leaves 9 times 2, or 18, leftover cookies. When we try to divide the 18 cookies four ways, there will be 2 left over.

#5: My present to Michael consisted of not just a single puzzle, but a whole bunch of questions inspired by our past work, most of which I don’t know the answer to. Just as Samuel Butler quipped that a chicken is an egg’s way of making another egg, sometimes a theorem is a problem’s way of making new problems.

#6: I’d originally planned to attend in person; indeed, my essay “What Proof is Best?”, which prominently features the number 14, was an expanded version of the talk I’d planned to give. Ultimately I decided to attend the fourteenth Gathering for Gardner remotely rather than in person.

**REFERENCES**

R. O. Davies, “On Langford’s Problem. II.” *Math. Gaz.* **43**, 253–255 (1959).

Martin Gardner, “Aha! Gotcha” (1982).

Martin Gardner, “Aha! Insight” (1978).

Martin Gardner, “Mathematical Magic Show”, chapter 5, problem 6.

Solomon Golomb, “Polyominoes” (1965).

Mathologer, “Why do physicists play with dominos?” (video)

John Miller, “Langford’s Problem, Remixed”

James Propp, Some 2-adic conjectures concerning polyomino tilings of Aztec diamonds, https://arxiv.org/abs/2204.00158

]]>As a teenager I was captivated by a 1973 book called *Communication with Extraterrestrial Intelligence*. It was edited by a not-yet-world-famous astronomer named Carl Sagan who was interested both in sending messages to the stars and in seeking messages from the stars to us. He went on to host the incredibly popular TV program “Cosmos” and to write several best-selling books, including the novel *Contact* about which I’ll have a lot to say later.

The reason I’m writing this particular essay this month is because almost exactly two centuries ago, the mathematician and astronomer Carl Friedrich Gauss proposed sending a message to the moon. (Gauss’ ideas about life on other worlds had a respectable pedigree in European thought; see the excellent articles by Aldersey-Williams and Dillard listed in the References.) Gauss had invented a kind of signaling device he called the heliotrope, and on March 25, 1822, he wrote a letter to the astronomer Heinrich Olbers, saying “With 100 separate mirrors, each of 16 square feet, used conjointly, one would be able to send good heliotrope-light to the moon. … This would be a discovery even greater than that of America, if we could get in touch with our neighbors on the moon.”

Gauss (or perhaps a contemporary of his) made a related proposal to install in the wheat fields of Siberia an enormous diagram of a 3-4-5 right triangle, embellished with extra lines in the manner of the proof of the Pythagorean theorem given in Euclid’s *Elements*. A big enough diagram would be visible from the moon and would prove to the moon-dwellers that there is intelligent life on Earth. (See Endnote #1.) It’s like aliens announcing their presence to us using crop-circles, in reverse. But speaking of circles, it’s worth noting that a giant circle would not have served the purpose of announcing the existence of intelligent Earthlings since many natural processes give rise to circles (such as the impacts that created circular craters on the Moon). On the other hand, Nature has shown no interest in proving theorems, and no physical process has ever been discovered that creates diagrams like Euclid’s.

Nowadays we know enough about other planets in our solar system and their moons to know that they are all inhospitable to life as we know it. If we seek cosmic company, we must look to solar systems light-years away from our own. At interstellar distances, even planet-sized pictures are illegible. So we must give up on the 19th century idea of communicating through pictures.

Or must we? For nearly a century, humans have contrived to convey pictures across great distances through the magic of television, which divides an image up into pixels and transmits those pixels through the air via radio waves. There’s no technical obstacle to our sending electromagnetic signals into outer space. Indeed, humankind began doing so, unthinkingly, at the dawn of the radio age. So if we wanted to, we could start broadcasting programs specifically designed to make a good impression on our cosmic neighbors (to compensate for the fact that we’ve already sent them all 98 episodes of “Gilligan’s Island”).

Of course, the aliens might be very different from us. They might have four arms, or two heads. Hard as it is to imagine, they might not even watch television. The evolution of science fiction shows a steady broadening of our conception of what a sentient being could be: a hive-mind, a world-mind, a super-intelligent shade of the color blue… you name it. There could be minds that don’t use language or understand the world through pictures, and we might have a lot to learn from those sorts of minds. But if we humans want to be less isolated in the universe, what better aliens for us to reach out to, in our first attempts at interstellar communication, than ones who resemble us? You have to start somewhere.

I’m now going to summarize the plot of Sagan’s novel. Sort of. You’ll want to read my summary even if you already read the book because I take some liberties with what Sagan wrote, and the main theme of my essay hinges on those deliberate discrepancies. And if you only saw the movie, definitely read my summary, because the book and movie differ in some key respects.

In the novel “Contact”, humanity receives a signal from Vega, a star twenty-six light years away. The signal seems to be just the base-two representations of the first 261 primes (see my essay “The Clatter of the Primes” from last month), and since no known physical process generates base-two representations of primes, the message seems to indicate the presence of a mind.

At first it appears that the message from the stars might be nothing more than a Cosmigram saying “I like primes. Do you like primes too? Send proofs.” (See Endnote #2.) But the message turns out to be much more than that; it’s composed of several layers of increasing complexity. Beneath the All-Primes,-All-The-Time layer is a faithful echo of Earth’s first TV broadcast, and beneath that is an instruction manual for building a machine, written in a language that we’re able to decipher because it’s heavily based on mathematics.

The Machine turns out to be a single-use round-trip transportation system that takes five lucky humans to an amazing chocolate factory in the middle of our galaxy, with other stops along the way, by way of wormholes (aka Tunnels) through spacetime. At the climax of their tour, the five humans encounter mind-reading, shape-shifting aliens (or maybe just aliens who’ve developed super-advanced technology for messing with people’s heads). The aliens describe themselves as mere Caretakers who hope that someday the Tunnel Makers will return and explain the meaning of Life, the Universe, and Everything, and while they’re at it, resolve certain mathematical mysteries that have them stumped. Even though the Caretakers are much smarter than us, the universe still fills them with feelings of wonder and awe, or to use a fancier phrase, A Sense Of The Numinous. As one of the Caretakers explains to Dr. Arroway, the novel’s protagonist:

“I don’t say this is it exactly, but it will give you a flavor of our numinous. It concerns the number one-third: the ratio of the counting number one to the counting number three. You know it well, of course; it is a non-decimal fraction, and you also know that you can never come to the end of its decimal expansion. Our scientists have found patterns in the digits of one-third in various bases, and we think it contains a message from the Creator of the Universe.”

At the end of the book, Arroway, having returned to Earth, pulls up the computer code that helped decipher the message from Vega and repurposes it to compute the base-eleven expansion of 1/3 and look for patterns. And sure enough, billions of digits out, she finds a string of 0’s and 1’s whose length is exactly one million. Turning that string into a thousand-by-thousand square of black and white pixels, she discovers a picture of a circle divided into three equal pieces, in perfect concordance with the concept of one-divided-by-three. Arroway has broken the code in the number one-third and has glimpsed the Unity underlying Reality; her long journey of discovery is at an end. Or is it only beginning?

I’m sorry; I lied in several places. I did warn you about that, didn’t I? The first lie I told was about that chocolate factory in the middle of the galaxy (if you missed it, you were skimming way too quickly). The second lie was my rendering of the conversation between Arroway and the alien. Here’s what the alien actually says (see Endnote #3):

“I don’t say this is it exactly, but it will give you a flavor of our numinous. It concerns pi, the ratio of the circumference of a circle to its diameter. You know it well, of course, and you also know that you can never come to the end of pi.”

Ah, so the aliens are entranced by pi! That makes *so* much more sense than having them go all gooey over one-third!

Don’t get me wrong, one-third is a pretty interesting number, but it lacks the cachet of pi. And not just because of pi’s Aegean mystique. Although both 1/3 and pi have decimal expansions that go on forever, they go on forever in entirely different ways. The digits of pi are varied and enigmatic, whereas the digits of the fraction 1/3 are dully predictable: it’s just 3’s all the way out. Other fractions are more interesting, like 22/7 (not far from pi on the number-line); its decimal expansion promisingly begins 3.14 but it too repeats eventually: 22/7 equals 3.142857142857142857… What’s more, 1/3 isn’t repetitive merely in base ten; its expansion in any integer base must repeat.

On the other hand, it’s known that pi is irrational, which means that the digits never stop offering novelty of some sort or other. In fact, mathematicians have found no discernible patterns in the digits of pi, and it’s believed that every sequence of digits will eventually turn up if you compute pi to enough decimal places. (See Endnote #4.)

My third lie (an elaboration of my second lie) concerns what happens when Arroway does her own computer experiments. Here is what she finds in the book’s (real) concluding paragraph when her computer computes enough digits of pi in base eleven:

“Hiding in the alternating patterns of digits, deep inside the transcendental number, is a perfect circle, its form traced out by unities in a field of naughts. The universe was made on purpose, the circle said. In whatever galaxy you happen to find yourself, you take the circumference of a circle, divide it by its diameter, measure closely enough, and uncover a miracle – another circle, drawn kilometers downstream from the decimal point. There would be richer messages farther in. As long as you live in this universe, and have a modest talent for mathematics, sooner or later you’ll find it. It’s already here. It’s inside everything. You don’t have to leave your planet to find it. In the fabric of space and in the nature of matter, as in a great work of art, there is, written small, the artist’s signature. Standing over humans, gods, and demons, subsuming Caretakers and Tunnel builders, there is an intelligence that antedates the universe.“

Ah yes, “transcendental”: what a wonderful math-word! It gives you goosebumps even if you don’t know what it means. Irrational numbers like the square root of two already stir awe, but transcendental numbers bring that awe to a whole new level. A number like the square root of two, when expressed as a decimal, has the same enigmatic aspect as pi, but the square root of two can be described by a simple algebraic equation: *x*^{2} = 2. Pi, on the other hand, satisfies no such equation; it transcends mere algebra. How much more fitting an object of veneration it is than a pedestrian number like one-third! How worthy a vessel it is for a message from an Artist who transcends mere Tunnel builders, who in turn transcend mere Caretakers, who in turn transcend merest us! And what better way to show one’s power than to bend an entire universe?

But not so fast. There’s something not quite right with Sagan’s conceit, and the best way to explain what’s wrong is to go back to my rewrite.

My version of the plot (the version in which pi gets replaced by 1/3) is dramatically unsatisfying, but there’s something worse about it from a mathematical point of view. After all, when you apply long division to divide 1 by 3, you keep computing 10 ÷ 3 = 3r1 (that is, three goes into ten three times, leaving a remainder of 1), and that 1 feeds back into the process by becoming a 10, over and over, in a vicious cycle. It’s hard to imagine a universe in which the calculation process burps and suddenly starts producing a string of 0’s and 1’s before going back to producing 3’s. The scenario becomes even less plausible when you stop to consider how many different mechanisms there are for computing in our world. What alternative laws of physics would cause all the different machines that might compute the digits of 1/3 in all kinds of different ways to burp in unison?

If you think (as I do) that the decimal expansion of 1/3 doesn’t depend on what universe you happen to be in, then I ask you: How are the digits of pi different?

You might reply “1/3 is an arithmetic quantity, whereas pi is a geometric quantity derived by measuring circles, and we can certainly imagine a Creator who warps those circles.”

If you say that, then you’re in good company; most mathematicians up until the Renaissance would have tended to view pi as a purely geometrical construct. And Sagan seems to think so too (see Endnote #5) when he says you can approximate pi if you “measure closely enough”.

But Arroway’s computer doesn’t measure circles. It’s a digital computer, doing digital calculations. And in fact, if our current theories of the universe are correct, there is no way, even in principle, to get more than a few dozen digits of pi by doing measurements; between the Scylla of general relativity and the Charybdis of quantum effects, you can’t just build a big circle and measure it ultra-accurately. Gravitational warping of space-time or the uncertainty principle or both will thwart your efforts.

What exactly is Arroway’s computer doing? Probably something like computing a really long decimal approximation to 16 arctan 1/5 and subtracting an equally long decimal approximation to 4 arctan 1/239 from it, where arctan *x* can be approximated by taking partial sums of the infinite series *x* – (1/3) *x*^{3} + (1/5) *x*^{5} – (1/7) *x*^{7} + … This series was first discovered by the Indian mathematician Madhava of Sangamagrama in the late fourteenth century, though it’s often attributed to the European mathematician James Gregory who rediscovered it two and a half centuries later. The sum appears to have nothing to do with circles. Geometry has been banished; instead, we have an infinite arithmetic expression. The switcheroo can be justified by calculus if we assume that Euclid’s axioms are correct and that arc length satisfies an additional axiom due to Archimedes. (See Endnote #6.)

If you have trouble imagining a universe in which 1/3 has a different decimal expansion than the one you were taught in school, then you should likewise have trouble imagining a universe in which arctan 1/5 or arctan 1/239 has a different decimal expansion, since the same sort of mechanical processes are involved; there’s a difference in degree of complexity, but not a difference in kind. So you should also have trouble imagining a universe in which 16 arctan 1/5 minus 4 arctan 1/239 has a different value. But that’s the universe of *Contact*.

The arctan formula for pi that I gave above is far from unique; at this point in mathematical history we know thousands of other formulas like it. Although the number pi originated in geometry, it has been fully liberated from geometry and from the peculiarities of the specific kind of geometry that we find in the physical universe. Or maybe some would say that pi has been imprisoned in calculus! Either way, it’s been transformed, and Arroway seems unaware of this transformation when she muses “If there was content inside a transcendental number, it could only have been built into the geometry of the universe from the beginning.”

But couldn’t there be universes built on fundamentally different geometries than ours? Indeed there could, and I wrote about some of these geometries in my essay “Three-point-one cheers for pi”. The problem with all of the geometries I know about, in terms of grounding the thrilling conclusion of Sagan’s novel in some mathematical plausibility, is that they don’t really take you away from the mathematicians’ pi; you’re still stuck in pi’s gravity-well. Sure, you can imagine a Creator who can bend space everywhere, in the fashion envisioned mathematically by Gauss and Riemann and physically by Einstein. But these sorts of geometries are still locally Euclidean, which means that as you probe at smaller and smaller scales, you recover Euclidean geometry and its pi. Indeed, in these geometries, the ratio of a circle’s circumference to the circle’s diameter varies as the circle grows or shrinks, so you could say that in those spaces, pi ceases to exist as a constant. Bending space breaks pi.

A more radical adjustment of geometry involves bending the exponent 2 in the Pythagorean theorem. Mathematicians who study such spaces call the exponent *p* and call the spaces *L*^{p} spaces. Just because the space we’re living in happens to be an *L*^{2} space (putting aside Einstein’s corrections) doesn’t mean we can’t imagine other possibilities! But there’s no escaping pi. If you were a scientist living in an *L*^{p} space, there’d be nothing to stop you from considering values of *p* other than the one governing *your* universe, and you’d be naturally led to ask “Which value of *p* makes the circumference-to-diameter ratio as small as possible?” You would discover that the answer is *p*=2, which would inexorably lead you to discover the pi of calculus. For that matter, you can build a computer in the *L*^{∞} universe using John Conway’s Game-of-Life rules and program it to compute “the pi of calculus”; even though “geometrical pi” in the *L*^{∞} universe is 4, the machine will spit out 3.1415…, not 4.0000…

If you want to imagine designer universes bearing the signatures of the artists who made them, by all means do so! Math is about imagination, after all. But don’t expect anything as humdrum as circles in Riemannian manifolds with variable curvature or *L ^{p}* spaces to make this numinous vision concrete. Something weirder will be required. (Anyone know of any math like that? I’m willing to admit that such a mathematics might exist, but there’s a difference between imagining

I can think of two other constants suited to the kind of numinous treatment Sagan describes, and as it happens, they are denoted by the Greek letters at opposite ends of the alphabet.

The first is the dimensionless physical quantity alpha known as the fine structure constant. It’s roughly 1/137. Physicists know of no reason why it couldn’t have a different value than it does, though if it were much different we wouldn’t be here pondering it, since a universe with a different value of alpha wouldn’t support our kind of life or even our kind of chemistry. One can imagine a God with the power to twirl a Dial and make alpha anything She wants, and who chooses a specific value of alpha that pleases Her, perhaps one that conveys some sort of message of love to the inhabitants of the universe She created. Unfortunately, we can’t currently measure alpha to more than about a dozen digits, and there are likely to be fundamental limits to how closely anyone can ever determine it, so if the value of alpha contains a message from our Creator, it’s a very short one. On the other hand, if we ever find a formula for alpha, so that we can determine more of its digits through computation than we could by measurement, then alpha will become imprisoned in calculus just like the modern mathematician’s pi. Alpha will become a mathematical constant, not a physical constant; it’ll be the answer to a specific mathematical question, and not something whose value we might imagine being adjusted by a Dial.

The mathematical constant pi does not transcend human understanding, but I can tell you about another number that does, a *truly* transcendent number, a number whose merest operational parameters pi is not worthy to calculate. I speak of the number Omega. (See Endnote #7.)

We humans have many algorithms for approximating pi as closely as we wish, but we do not have, and indeed *cannot* have, an algorithm that approximates Omega as closely as we wish. That’s because Omega was defined by Gregory Chaitin in a fashion that’s intimately tied up with the unsolvability of the halting problem; this result, proved by Alan Turing, is a near-relative of Kurt Gödel‘s incompleteness theorem. To have an algorithm that computes Omega to any specified accuracy, we’d have to have a way to solve every mathematical problem that ever has been or ever could be formulated, and Gödel’s theorem bars our entrance to that paradise of omniscience. The same goes for any alien species, no matter how intelligent. Each will ultimately be defeated by the riddles the Omega-sphinx asks travelers, for she knows infinitely many riddles, each harder than the one before; no finite being can pass her.

Have I whetted your appetite for more information on Omega? If so, good! I think that in some places in his book Sagan makes the mistake of explaining too much, so let me maximize your sense of wonder over Omega by explaining too little.

If I were writing a book like Sagan’s, I’d do something different near the end. I’d have the alien say:

“We’ve received a different signal, not using any kind of radiation you humans know about, emanating from everywhere at once or from beyond the universe (which are two different ways of saying the same thing). It’s an infinite string of 0s and 1s, and it seems to be the base-two representation of the number you humans call Omega. That is, it seems to reveal, for each finite computer program, whether that computer program ever halts. We can’t verify that that’s what it’s truly doing, but it sure seems like it. For the last million years we’ve been stuck at verifying the 97th bit; we’re hoping to find new approaches to understand why a particular program that seems to run forever actually does. It’s related to a simple-sounding yet fiendishly difficult problem that your species solved in 1994, so we’re hoping that someday you’ll be able to help us with it! Anyway, if the Message really is the binary representation of Omega, then it can’t have been created by any finite mind in our universe. It must be the product of some sort of infinite mind outside of Time.”

I agree that Omega is a lot more arcane than pi. But I think it’s a lot more mind-blowing. In fact, I’d call it pretty effing ineffable.

Mathematicians have mined a ton of numinosity from the true mathematics of pi, even without finding patterns in its decimal digits. For instance, consider the way pi comes up in statistics in the definition of the Gaussian distribution. Eugene Wigner, in his famous 1960 essay The Unreasonable Effectiveness of Mathematics in the Natural Sciences wrote:

“There is a story about two friends, who were classmates in high school, talking about their jobs. One of them became a statistician and was working on population trends. He showed a reprint to his former classmate. The reprint started, as usual, with the Gaussian distribution and the statistician explained to his former classmate the meaning of the symbols for the actual population, for the average population, and so on. His classmate was a bit incredulous and was not quite sure whether the statistician was pulling his leg. “How can you know that?” was his query. “And what is this symbol here?” “Oh,” said the statistician, “this is pi.” “What is that?” “The ratio of the circumference of the circle to its diameter.” “Well, now you are pushing your joke too far,” said the classmate, “surely the population has nothing to do with the circumference of the circle.””

And it’s not just in statistics that pi makes an unexpected guest appearance; these cameos occur all over the place in math. You might say that pi is a Tunnel through the hyperspatial mathematical landscape, serving as a magical bridge between seemingly far-flung domains in inner space. It and other bridges across the land of pure imagination called Mathematics are good enough for me, until the aliens contact us.

When the aliens contact us or vice versa, I’ll be eager to learn what alien math is like. How different might it be? People often ask whether math is created or discovered, and I’m very sympathetic to the latter view because, despite years of strenuous engagement with the terrain of mathematics, I’ve developed no ability to bend it. Some people think that the solidity of our planet’s mathematical consensus is evidence that mathematical reality is in some deep way objective. But this sociological evidence of the objectivity of mathematics is tainted by the fact that humans have been sharing mathematical ideas from culture to culture for thousands of years, and some of the uniformity of our perceptions could be a consequence of our interactions.

But if aliens had math that looked like ours, developed entirely separately from our mathematics — now *that* would be very strong evidence that math is built into the fabric of our universe, and maybe built into the fabric of whatever logical infrastructure permits universes to exist in the first place.

There’s a video series on the question of whether math is discovered or invented. I haven’t watched it, and I don’t plan to. For one thing, if you look at the thumbnails of the videos, you’ll notice that the speakers aren’t just mostly old white males; they’re all *Earthlings*! I think I’ll want to hear from some extraterrestrials before I form an opinion on the question. But, just as importantly, I think the question is premature. Do we even know what math is yet? It’s still early days. Our species’ journey of mathematical discovery is only just beginning.

*Thanks to Sandi Gubin.*

Hugh Aldersey-Williams, “The Uncertain Heavens: Christian Huygens’ Ideas of Extraterrestrial Life” at publicdomainreview.org.

George Dillard, “A Golden Age — of Belief in Extraterrestrials” at historyofyesterday.com.

Martin Gardner, “Chaitin’s Omega,” chapter 21 in *Fractal Music, Hypercards and More: Mathematical Recreations from SCIENTIFIC AMERICAN Magazine*.

Carl Sagan (ed.), *Communication with Extraterrestrial Intelligence*, 1973.

Carl Sagan, *Contact*, 1985.

#1. There’s a poetic aptness to using the Pythagorean theorem as a way of contacting hypothetical Moon-dwellers, since the Pythagorean philosopher Philolaus of Croton believed the moon was inhabited. Then again, the Pythagorean Theorem was known in many parts of the world long before Pythagoras was born, and it’s not even clear whether the proof given by Euclid was discovered by the Pythagorean school.

#2. On the other hand, the plaque Sagan sent on Voyager, featuring nude humans waving hello, could be construed as a solicitation of a much less innocent kind.

#3. In reply, Arroway says “But this is just a metaphor, right?” but gets no direct answer. Also, the physicist Eda, one of the other four travelers, is told the same story, but about a class of transcendental numbers Arroway hadn’t heard of. Sagan chooses to be a bit vague here, as well he should, since nothing dispells numinosity more than revealing too much.

#4. Pi reminds me a bit of Borges’ “Library of Babel” which contains every possible book. If you convert Sagan’s novel into 0’s and 1’s, then that string of 0’s and 1’s appears somewhere in the decimal expansion of pi, if our current guesses about pi are correct. Of course, pi would also contain far more numerous erroneous versions of the novel, and many, many more copies of my bogus summary if only because it’s much much shorter. Pi also reminds me of the “Infinite Monkey Theorem”.

#5. I like to think that Sagan the scientist knew everything I say in this essay, but that Sagan the novelist pretended not to know it for the sake of crafting a more accessible and appealing story.

#6. One version of Archimedes’ axiom is that if you have convex regions *A* and *B*, with *A* containing *B*, then the perimeter of *A* exceeds the perimeter of *B*. For instance, if *A* is a square of side-length 2, and *B* is the disk of diameter 2 inscribed in *A*, and *C* is the regular hexagon of side-length 1 inscribed in *B*, then the perimeters of *A*, *B*, and *C* are 8, 2pi, and 6, respectively, proving that pi lies between 3 and 4. So the first digit of pi, at least, has some sort of intuitive geometric meaning, even if the later digits seem meaningless to our puny human minds.

#7. In fact, there is not a single Omega number; rather there are infinitely many, one for each prefix-free universal computable function *F*. But we know of some fairly simple *F* ‘s, so it’s common to informally assume we’ve agreed on one of them without specifying which. This doesn’t bother anyone, though it means you’ll have some trouble figuring out what day of the year to designate as Omega Day.

Plus said to Times “No offense, friend, but I’m just better at building numbers than you are. Starting from 1, the smallest number, I can build lots of new numbers: 1+1 is 2, 1+1+1 is 3, and so on. But look at you! 1×1 is just 1. 1×1×1? 1 again. And so on. Boring!”

Times naturally became defensive. “Now that’s just not fair. You’re using the wrong building block. Instead of 1, try 2.” And the number 2 began to twinkle. “2×2 is 4. 2×2×2 is 8. And so on. See, I get new numbers, just like you, and mine are bigger than yours!”

Plus said “I can get all those numbers, and more; it just takes me longer. But I get some numbers you can’t get. 3 is 1+1+1, but you’ll never get 3 by multiplying 2’s.”

Times, thinking quickly, retorted, “I never said I could get everything from 2’s. I also use 3 as a building block.” Then the number 3 began to twinkle. “For instance, with 2 and 3, I can get 2, 4, 8, and so on, and 3, 9, and so on. And I mix 2’s and 3’s, so I get 6 and lots of other numbers too.”

Plus said “What about 5? How do you get 5 by multiplying 2’s and 3’s?”

Times airily answered “Oh, I never said 2 and 3 would be enough! 5 is another one of my building blocks.” And the number 5 began to twinkle.

Plus asked “How many of these building blocks do you have?”

Times didn’t answer right away. As I said, numbers and operations were still figuring themselves out. But Times wasn’t going to retreat from this showdown with Plus; Times had to prove that Times, like Plus, had an orderly way of building up all the numbers.

**PRIME NUMBERS**

The building blocks that Times needed are called prime numbers. The ancient Greeks got interested in prime numbers and figured out that there are infinitely many of them, though they didn’t phrase it that way; they said something closer to “No matter how many primes you’ve found, there’s always a prime you haven’t found.”

Before we see why you can never run out of primes, let’s make a table that shows how the numbers 1 through 10 can be written as products of primes. I’ve left a blank in the first row, because 1 is a special case: it’s sometimes called the product of no primes at all, but if that doesn’t make sense to you, don’t worry about it.

One pattern in the table is that no prime occurs in two consecutive rows. You don’t see a 2 in two consecutive rows because when two counting numbers are consecutive, one of them is a multiple of 2 and the other one is not. Likewise you don’t see two consecutive counting numbers that are both multiples of 3, because the multiples of 3 are spaced three apart, like the downbeats in a waltz. The same goes for 5 and 7, and if we had a bigger table, it would apply to larger primes as well. The primes that are factors of the counting number *n* cannot be factors of the counting number *n*+1.

Putting it differently: From an additive-construction perspective, the numbers *n* and *n*+1 are very similar (just tack on an extra “+1” at the end of Plus’s way of the number *n* and you’ve got Plus’s way of writing *n*+1), but from a multiplicative-construction perspective, *n* and *n*+1 are as different as mosques and mosquitos (to pick two words that are adjacent in the dictionary but have nothing to do with each other). Their representations as products are as different as can be.

**PRIMES WITHOUT END**

Now I can explain why you’ll never run out of primes. Take all the primes you know, find a big number *n* that’s a multiple of all of them (say by multiplying them all together), and then add 1 to that big number. Your new, ever-so-slightly bigger number, *n*+1, can’t be a multiple of any of the primes you know (because *n* was a multiple of all of them), so it’s either a prime itself (one that you didn’t know) or it’s a product of two or more primes (also previously unknown to you). Either way, there are new primes to meet. (See Endnote #1.) So, for example, 2×3 + 1 is a new prime (hello 7) and 2×3×5 + 1 is a new prime (hello 31), and 2×3×5×7×11×13 + 1 is the product of two new primes (hello 59 and 509).

There’s a song about this proof: check out “Plentitudinous Primes” by Hannah Hoffman and Joel David Hamkins.

Notice that even though the proof tells you *why* there’s got to be another prime besides the ones you know (or maybe several mew primes) it doesn’t tell you *what* that new prime is (or what those several new primes are): *n*+1 could be prime, or it could be the non-prime product of two or more primes. The proof doesn’t tell you which. Mathematicians don’t really know to what extent one might expect the first situation to prevail as opposed to the second, though they’ve come up with some good guesses about special circumstances. For instance, say *n* itself is 2 times a prime (let’s write *n* = 2*p*). When 2*p*+1 is prime, we call *p* a Germain prime. Are there infinitely many Germain primes? Mathematicians think so, but they haven’t found a proof. Or, say *n* is a power of 2 (let’s write *n* = 2* ^{k}*). When 2

And that’s just one tiny corner of all the things we *don’t* know about primes, despite the efforts of a lot of hardworking and clever people. For instance, you may have heard about the twin prime conjecture, which concerns primes *p* for which *p*+2 is also prime (“twin primes”). Are there infinitely many twin primes? Mathematicians think so, but they haven’t found a proof.

**MUSIC OR NOISE?**

If you’re tired of my saying “Mathematicians think … but they haven’t found a proof”, then you probably won’t like number theory, which has a lot of we-don’t-knows. Number theory is the study of prime numbers. Well, not *just* prime numbers. It’s also the study of other things involving whole numbers, such as the question “When can an integer be written as the sum of two perfect squares?” but (surprise!) that question turns out to be a question about primes in disguise. And this phenomenon (hello again, prime numbers!) happens over and over when you ask questions about whole numbers.

Number theory is full of simple-sounding questions number theorists don’t have answers to, so if your archetype of mathematics is middle school arithmetic, where there’s a procedure to finding the answer to every question, or high school geometry, where every true proposition has a tidy proof, then number theory isn’t for you. In fact, if you suffer from trypophobia (an aversion to the sight of irregular patterns or clusters of small holes or bumps), then you should stop reading this essay right now, because “irregular patterns of bumps” is a pretty spot-on description of the primes.

Not everyone likes Stravinsky, either. His “Rite of Spring” caused a big to-do (though not a riot; see Endnote #3) in the spring of its premiere. Music-lovers are more used to dissonance nowadays than they were in 1913, but many people still find the piece unpleasant. Consider an eight-bar passage from near the beginning of the piece, about four minutes in. I can’t include an audio snippet here, but you can listen to a YouTube recording, such as the London Symphony Orchestra version, and advance to the four-minute mark if you’re in a hurry. Or you can listen to my rendition of this snippet using a synthetic piano (and an approximation to Stravinsky’s chord), if you’ve got a midi player:

http://mathenchant.org/082-rite.mid

Those accents on beats 10, 12, 18, 21, 25, and 30 don’t land when you expect them to, even when you *remember *that they don’t land when you expect them to (I’m guessing that this was the effect Stravinsky was aiming for). Sometimes the blows land two beats apart, sometimes three beats apart, sometimes more; we hear suggestions of order, only to have the patterns get broken.

The primes have some of that same unpredictability. Here I’ve rendered the numbers 1 through 720 using Stravinsky’s chord, where the primes are loud and the non-primes are soft.

http://mathenchant.org/082-primes.mid

If you listen closely, you’ll notice that every other beat (with one solitary exception) provides guaranteed safety against those accents, because all but one of the primes are odd. If you pretend you’re in fin-de-siecla Vienna and you listen to the MIDI file with your waltz-ears on, you’ll notice that (with one solitary exception) every third beat also provides safety, because only one of the primes is a multiple of three. And if you’re comfortable parsing music into groups of five beats or seven beats, you can hear a similar pattern of “safe” beats. (Alternatively, you can listen to the MIDI file

http://mathenchant.org/082-nonprimes.mid

in which I’ve accented the non-primes instead of the primes; now there’ll be certain beats on which you’re guaranteed to hear an accent, instead of guaranteed *not* to hear one.) But other than that, it’s hard for the musical mind to make sense of what it’s hearing. When rendered as a purely rhythmic composition, the primes form a music whose brief passages of order and pattern are only there to throw its savage randomness into sharper relief.

Except … the primes aren’t really random! Quite the opposite. Every individual number is either prime or isn’t; the primes are *exactly where they inevitably must be*. The irregularity of the primes and our attendant discomfort aren’t the result of choices made by a Stravinsky-ish Creator; the irregularity is an inexorable consequence of the laws of arithmetic, while the discomfort is the result of human nature and of our choice to look at (or listen to) the primes.

We can choose to look away; some people do. When Fermat tried to entice others into joining him on his forays into number theory, only a few of them took the bait. “We have no lack of better things to think about,” one of his pen-pals wrote. Even today some mathematicians react that way to the primes. And that’s fine! Once you get past the basics, mathematics branches into dozens of different subdisciplines, and there are enough different kinds of math to like that you can dislike two or even three of them and still be a respectable mathematician.

The study of primes is rife with frustration. Of course, frustration can be a wonderful thing if it’s inflicted not on you but on people who are trying to break into your house or hack into your computer accounts; the thorniness of primes turns out to be helpful when we design computer systems to thwart the efforts of bad actors. But that’s probably not news to you, if you’ve ever delved into the math that underlies cryptocurrency.

**LIOUVILLE AND CHOWLA**

But I do have actual news for you. Or at least, I can report on a discovery made less than a year ago by researchers Harald Helfgott and Maksym Radziwiłł. Their work concerns a question raised by the mathematician Sarvadaman Chowla back in 1965. Like all twentieth-century number theorists, Chowla was aware of the intriguing twin prime conjecture and equally aware that very little progress had been made toward resolving it. So he applied an old strategy of mathematical researchers: when faced with a problem you can’t solve, explore a related problem. Instead of looking at numbers that are 2 apart, as in the twin primes conjecture, Chowla looked at numbers that are 1 apart. Clearly they’re not both primes (leaving aside the case of 2 and 3), since one of them is even and 2 is the only even prime. Chowla’s question is about something called parity, or more precisely Liouville parity, named after the number theorist Joseph Liouville who first studied it. The Liouville parity of *n* has to do with how many primes you multiply together to get *n*. For instance, we might say that 12 is “Liouville-odd” because we write it as 2×2×3, a product of three primes (it’s okay to repeat a prime but you have to count the repeats).

Number theorists don’t actually use the terms “Liouville-even” and “Liouville-odd” the way I did; instead, they write “*L*(*n*) = +1″ and “*L*(*n*) = −1″ according to whether the number *n* is the product of an even number of primes or an odd number of primes. The Liouville function of *n*, denoted by *L*(*n*), is defined as −1 to the power of the number of factors when *n* is written as a product of primes. (It’s convenient to regard 1 as the product of no primes at all, which is to say, zero primes, so that *L*(1) = (−1)^{0} = 1.)

It’s known that in the long run, half of the positive integers are Liouville-even and half are Liouville-odd. For instance, from 1 to 1000 there are 493 Liouville-even integers and 507 Liouville-odd integers, while from 1 to 1,000,000 there are 499,735 Liouville-even integers, and 500,265 Liouville-odd integers. Even as the absolute error grows (from ±7 to ±265), the relative error shrinks (from ±1.4% to ±0.1%), getting ever-closer to 0%.

Here’s what you get if on the *n*th beat you play a C if *n* is Liouville-even and D if *n* is Liouville-odd.

http://mathenchant.org/082-liouville.mid

I don’t hear any long-term patterns; do you?

In contrast, here’s what you get when you play a random sequence of C’s and D’s determined by tossing a coin (or rather by using a commercial pseudorandom number generator that does a decent job of simulating a fair coin):

http://mathenchant.org/082-random.mid

Does it sound the same to you? Or can your musical ear discern a difference between the noise of random numbers and the music of Liouville’s function?

Chowla’s question concerned the relationship, if any, between the Liouville-parity of nearby numbers. For instance we might compare the Liouville-parities of *n* and *n*+1. If the coin-toss analogy holds, we might expect that, among the first 1000 positive integers, each of the four possibilities (*n* is L-odd and *n*+1 is L-odd; *n *is L-odd and *n*+1 is L-even; *n* is L-even and *n*+1 is L-odd; *n* is L-even and *n*+1 is L-even) will occur about 250 times. In fact, these four outcomes occur 261 times, 246 times, 247 times, and 246 times, respectively. Not a bad fit. And if Chowla’s guess is right, we can expect the fit to get better and better (proportionately) when we replace 1000 by ever-larger cutoffs.

**NUMBERS AND GRAPHS**

To find out how Helfgott and Radziwiłł improved on earlier results, check out Jordana Cepelewicz’s excellent article published in Quanta late last year. One of the main innovations of the new paper of Helfgott and Radziwiłł is the use of methods from graph theory to approach a problem from number theory. Number theory and graph theory were the two favorite subject areas of Paul Erdős; he would have loved to see them in such an intimate embrace.

Graph theory is often used to model social networks, and in a way, the math of Helfgott and Radziwiłł embodies a kind of social view of the positive integers, in which numbers “conspire” (or don’t conspire) to have related values of Liouville’s function.

If you know a little bit of applied graph theory, you may have heard of small-world networks, in which it’s possible to find a short chain of connections between any two nodes (think “six degrees of separation“). The work of Helfgott and Radziwiłł is based on a different way of measuring how well the connections bind the whole network together. The key word here is “expander“. I won’t give a technical definition, but I’ll give two examples of networks that fail to be expanders and one that does a decent job.

In each of our networks, the nodes being connected are the positive integers 1, 2, 3, … but the patterns of connections are different. In the first network, we create a one-way connection from *i* to *j *whenever* j*=*i*+1 or *j*=*i*+2. Thus, 1 gets connected to 2 and 3, 2 gets connected to 3 and 4, 3 gets connected to 4 and 5, etc.

Starting from 1, there are four journeys of length two we can take (1 to 2 to 3, 1 to 2 to 4, 1 to 3 to 4, and 1 to 3 to 5), leading us to three possible destinations: 3, 4, and 5. Starting from 1, there are only four possible destinations: 4, 5, 6, and 7. The number of destinations we can reach in *k* steps is always equal to *k*+1. So as *k* increases, the number of destinations reachable in *k* steps gets bigger, but it does not increase quickly.

What if instead we create a one-way connection from *i* to *j* whenever *j*=2*i* or *j*=3*i*?

Once again, the number of destinations reachable in *k* steps, starting from 1, is always equal to *k*+1 *—* a linear function of *k*.

But now, let’s mix addition and multiplication. What if we create a one-way connection from *i* to *j* whenever *j*=2*i* or *j*=2*i*+1?

Then starting from 1 we can take a one-step journey to 2 or 3, a two-step journey to 4, 5, 6, or 7, a three-step journey to 8, 9, 10, 11, 12, 13, 14, or 15, etc. The number of destinations now grows *exponentially* as a function of *k*. That’s the kind of expansion that expander graphs have.

You’ll have to read Cepelewicz’s article if you want to know more, and I hope you will. The paper of Helfgott and Radziwiłł doesn’t settle Chowla’s original problem, but it gives stronger results than anyone has gotten before, and raises hopes that the new approach will yield other fruits as well.

**NUMBERS AND ORBITS**

Meanwhile, other researchers are approaching Chowla’s conjecture from a totally different direction, using another branch of mathematics that came of age in the 20th century: ergodic theory. Vitaly Bergelson, El Houcein El Abdalaoui, Joanna Kulaga-Przymus, Mariusz Lemanczyk, Redmond McNamara, Florian Richter, Thierry de la Rue, and others have been applying ideas that originally arose in Henri Poincaré’s work on celestial mechanics. I learned about this work quite recently in a talk given by Richter and I won’t attempt to even sketch it here. But I think the way different communities of researchers are using very different tools to study the weird ways of primes reveals something about the fundamental unity of mathematics. When one is tackling a truly deep and obdurate phenomenon like the statistical regularity that lies hidden in the primes, all of humanity’s best mathematical tricks have a role to play. I believe this tells us how deep mathematics is, and how limited the human mind is when it comes to plumbing those depths. After all, our brains didn’t evolve to do math; when you’re as ill-suited to abstract mathematics as humans are, every little bit helps!

If you ask me to guess which community will prove Chowla’s conjecture, I think that’s the wrong question. A better question is, Which community will prove Chowla’s conjecture *first*?, because it’s likely that multiple approaches will ultimately pay off. But even that question bothers me, because the first proof isn’t always the one that’s easiest to read or most illuminating. A better question might be, Which community will prove Chowla conjecture *best*? But I don’t like that question either. Why must math research be a contest? Mathematics is the richer for having multiple paths to its truths. For instance, take the Prime Number Theorem, one of the cornerstones of analytic number theory. The ergodic theory folks have found a lovely new way to prove it. But I wouldn’t want this proof to displace earlier proofs.

And now that I stop to think a bit harder about recent mathematical history, I think it’s quite possible that the first proof of Chowla’s conjecture will use ideas from graph theory *and* ideas from dynamics, and maybe some other areas of mathematics outside of number theory. Consider the work of Ben Green and Terence Tao on the existence of arithmetic progressions in the primes; it used ergodic theory, combinatorics (of which graph theory is a part), geometry, and harmonic analysis.

**THE LIMITS OF KNOWLEDGE**

There are many ways to understand Kurt Gödel‘s famous First Incompleteness Theorem (and even more ways to misunderstand it). One approach, popularized by computer scientist Douglas Hofstadter in his book “Gödel, Escher Bach,” is to view it as an illustration of the pitfalls of self-reference in formal systems. But you can also view the incompleteness theorem as illustrating how richly addition and multiplication interact.

If you create a mutilated version of number theory that includes addition but not multiplication (Presberger arithmetic), or one that includes multiplication but not addition (Skolem arithmetic), then you get a tidy mathematical universe in which all the questions you can ask can be answered without any need for creativity. It’s a lot like Leibniz’s rosy vision of a world in which questions of morality or public policy could be reduced to calculation. But if you include both addition and multiplication, then you’re in the realm of Peano arithmetic, and Gödel’s clever construction (based in part on the prime numbers) shows that you can generate assertions that are true (assuming that Peano arithmetic is consistent) but cannot be derived from the Peano axioms. Leibniz’s dream is doomed to failure, even within the precincts of mathematics.

Given the role that the primes played in toppling Leibniz’s dream, it’s not too paranoid to suspect that some properties of the prime numbers may elude human reason forever. The philosopher Timothy Morton coined the term “hyperobject” to denote a thing that is too large for the human mind to grasp. He had things like global warming in mind, but I think the set of prime numbers also qualifies.

**EPILOGUE**

Plus and Times looked up at the numbers that Times had set twinkling: 2, 3, 5, 7, and infinitely many others. And they saw that they were messy.

“Not as neat as the sequence of odd numbers,” said Plus.

“Not as tidy as the sequence of powers of two,” said Times.

They were quiet for a bit.

“Not as sweet as the sequence of perfect squares,” said Plus.

“Right,” said Times. “With squares, the spacing gets bigger and bigger as you go out. *These* numbers only get *mostly* farther apart as you go, and even then, only sort of.”

And they were quiet again for a while.

“Can’t say I like them much,” said Plus. “Bit of a jumble. Aren’t we arithmetic operations supposed to be about order, pattern, and regularity?”

“I don’t really like them either,” admitted Times. “I kind of regret making them now.”

“Yeah, well, but I *made* you make them. With my boasting, I mean,” said Plus, who then paused, as if the very next sentence would be hard to say. “But … it’s the strangest thing, but I kind of feel I *ought* to like them.”

“I feel the same way!” exclaimed Times. “Something inside me says I ought to try to *learn* to like them. I don’t want to be the sort of being who only likes things that are easy to like.”

Another long silence followed.

“Well,” said Plus to Times, “we certainly created a big mess when we came up with those primes! But I’m guessing that if we put our heads together we can figure them out.”

“We do work well together,” said Times to Plus shyly.

And the two of them rose up together, still talking, into a firmament bright with ineffable yet inevitable constellations.

*Thanks to Jeremy Cote, Noam Elkies, David Feldman, Rebecca Gans, J. Ruth Gendler, Sandi Gubin*, *Evelyn Lamb, and Evan Romer.*

**ENDNOTES**

#1: This argument is often presented as a proof by contradiction, along the lines of:

“We want to prove that the set of primes is infinite. Let’s suppose it’s finite and see where that supposition leads. If there are only finitely many primes, we can multiply them all together to get a number *n* that’s divisible by all the primes. But then *n*+1 is a number that’s divisible by *none* of the primes, and that’s a contradiction because every counting number can be written as a product of primes. So there must be infinitely many primes after all.”

This is the rhetorical gambit of *reductio ad absurdum*, in which one undermines a premise by showing that it leads to unacceptable conclusions. The resemblance to legal argumentation brings to mind the fact that Fermat himself was a jurist.

I recently learned about a wacky Delaware court case, “Joseph Alfred v. Walt Disney”, featuring a complainant who, in suing Disney for a purported breach of contract involving the idea of a flying car, somehow managed to invoke Euclid’s proof of the infinitude of the primes (as well as Star Trek, Star Wars, Game of Thrones, the epic of Gilgamesh, and other cultural touchstones). Judge Sam Glasscock archly summarized the complaint thus: “It is well-written and compelling. In fact, it can be faulted only for a single *—* but significant *—* shortcoming: it fails to state a claim on which relief could be granted. Therefore, I grant the Defendants’ Motion to Dismiss.”

But I find an additional shortcoming in the section of the complaint that mentions Euclid. Here’s what the complainant wrote:

“The Walt Disney Corporation created an implied contract with the plaintiff when it changed its own policy against submitting unsolicited submissions by a third party. The plaintiff can infer an implied promise based on circumstances that exist in the ordinary course of dealing and common understanding. Why even take the teleconference call on July 22, 2014 if there were not mutual agreement that the campaign would be successful for the Disney Corporation? There is an often used mathematical principle to solve difficult theorems: to prove something, disprove the opposite (See Euclid’s proof on the infinity of prime numbers).”

But Euclid’s proof as Euclid wrote it was *not* a proof by contradiction. All he showed is that if you have three primes you can always find a fourth, and left the reader to apply the same idea to infer the more general fact that prime numbers are more than any assigned multitude.

#2. It’s time I confessed to some authorial mischief, specifically, a nomenclatural switcheroo, when I used the terms “Germain prime” and “Pierre Fermat prime”.

People usually call primes *p* for which 2*p*+1 is also prime “Sophie Germain primes” (as opposed to “Germain primes”), presumably to highlight the fact that Germain was a woman. Germain kept her gender secret during her studies, since at that time women in France weren’t allowed to attend universities. When Gauss, the pre-eminent mathematician of that age and a correspondent of hers, learned that the venerable “Monsieur LeBlanc” was in fact a woman, he was all the more impressed, and praised her for having “the noblest courage, extraordinary talent, and superior genius”.

On the other hand, people usually call primes of the form 2^{k}+1 “Fermat primes”, not “Pierre Fermat primes”.

Before we discuss the fact that mathematicians usually give Germain’s first name but not Fermat’s when talking about the classes of primes that bear their names, I should mention that Fermat’s story is linked to Germain’s in a deep way. Fermat is famous for his assertion that if *n* is a whole number bigger than 2, the equation *x ^{n}* +

Nor were Germain’s mathematical talents limited to number theory. She won a prize for presenting the first mathematical account of what’s going on with Chladni plates.

So why does the great mathematician Fermat get referred to by his surname only while the great mathematician Germain gets referred to by her full name? Probably for the same reason that the great novelist Austen is often called “Jane Austen”: the default mathematician, like the default novelist, is seen as male, so exceptions to the stereotype get marked.

My question is, does including Germain’s first name help or hinder the cause of equality for women? In an ideal world, free of gender prejudice, it would of course be pernicious to introduce different nomenclatural standards for men and women. But we don’t live in such a world, and until we do, maybe it’s good to remind ourselves and others that some great mathematicians were women.

Then again, think back to what went on in your mind (please be honest, if only with yourself) when I called them “Germain” and “Pierre Fermat”. Giving someone a first name brings them down to human scale; conversely, omitting that name turns them into someone severe, almost inhuman, chiseled out of granite. To put it starkly: “Prof. Germaine” sounds like a colleague whose opinion you would defer to at department meetings, whereas “Sophie” sounds like a colleague whom you might ask to refill the department coffee maker.

When we selectively humanize women and not men in some profession, we diminish women collectively. Specifically, we make women seem warmer but less competent. Here I am leaning on ideas from social psychology that suggest that, although warmth and competence are compatible, we tend to see them as mutually exclusive, and to the extent that we see evidence that a person possesses one of the two traits, we tend to infer that they lack the other.

So what do you think? “Germain primes”, or “Sophie Germain primes”? And more broadly, what’s the best way to disable the gendered stereotype of mathematicians that we all tend to have regardless of our gender? Please share your open-minded, curious and polite thoughts in the Comments!

Oh, and getting back to Pierre, you may note that I called him “Pierre Fermat” not “Pierre de Fermat”, as is common. He earned the “de” when he become a government official in Toulouse, which I’m sure was very nice for him, but what does his title have to do with his mathematics?

#3: I have to confess that I uncritically accepted the standard legend of the opening night riot until Evelyn Lamb sent me a link to Linda Shaver-Gleason’s essay “Did Stravinsky’s The Rite of Spring incite a riot at its premiere?” I found the essay illuminating in many ways. Not only did it debunk a myth I’d believed for half a century, but it also situated “The Rite of Spring” in a broader cultural context. The most eye-opening part of the essay for me was a video showing a 1987 recreation of the original choreography: it made me realize that the jerky movements of the dancers could be construed as comical. I’d always thought of the piece as Serious And Important Art, but seeing those men in conical hats thrashing like oversized muppets made me understand the reaction of the aristocrats in the balcony (the ones whose laughter angered the music lovers, thereby setting in motion a disturbance that legend magnified into a riot). It’s a commonplace that a major ingredient of humor is surprise; that’s one of the things that makes the wrong notes in a P.D.Q. Bach piece funny. So it stands to reason that accents that arrive when you don’t expect them to could be funny as well.

And this led me back to thinking about the primes. Erdős once said “God may not play dice with the universe, but something strange is going on with the prime numbers,” and he is not the only mathematician to have thought so. Might mathematical culture someday reach a vantage point of sophistication from which primes are seen, not as funny-strange, but as funny-ha-ha?

]]>