**TWO TRIGONOMETRIES**

Like many people, I learned two trigonometries, a few years apart. The first was about triangles and it had pictures like this:

The angle *θ* was seldom more than 90 degrees and never more than 180 degrees (because what kind of triangle has an angle bigger than 180 degrees?).

The second trigonometry was about circles and it had pictures like this:

Now *θ* was allowed to go all the way from 0 degrees to 360 degrees and beyond, as the black point went round and round counterclockwise. This trigonometry wasn’t about trigons (“three-cornered things”); it should have been called *cyclometry*, since it was about measurements of circles, and it was by extension a tool for understanding all processes that go round and round, including less overtly geometrical ones like the lengthening and shortening of days in the circular parade of seasons. This expansive version of trigonometry helps us understand many processes that go up and down, as long as they, like carousel horses, go up and down in a predictable and repetitive fashion:

This graph could show the rises and falls of two such horses but it’s actually a graph of the sine function superimposed with a graph of its twin, the cosine function.

Stop for a minute to wonder: how are such pictures made?^{1} We’ve come to take graphing calculators for granted, forgetting how much work is involved with plotting a function like *y* = sin *x*. Even computing the sine of *x* for a *single* value of *x* to, say, six digits is a computational feat all on its own, one that would have taxed the powers of Archimedes. We know that an electronic calculator doesn’t have a tiny person inside it, drawing and measuring triangles or circles. So what *does* happen inside your calculator when you press the “sin *x*” button?

The answer came five centuries before the question, and it came from the Kerala province of India.^{2}

**MADHAVA OF SANGAMAGRAMA (1350 – 1425)**

Madhava of Sangamagrama was one of the greatest mathematicians^{3} of the 14th century, yet we know little of his life. Indeed, none of his writings (if writings there were) survive; we know of his pioneering contributions only through commentaries written by his successors, who constituted the Kerala school of astronomy and mathematics.

Madhava discovered that the sine and cosine functions, which had been invented in something close to their modern form by the Indian mathematician Aryabhata a thousand years earlier, can be expressed as “polynomials” with infinitely many terms. Here (translated into modern notation) are Madhava’s formulas^{4}:

sin *x* = *x* − *x*^{3}/6 + *x*^{5}/120 − *x*^{7}/5040 + …

cos *x* = 1 − *x*^{2}/2 + *x*^{4}/24 − *x*^{6}/720 + …

Notice that as one reads each formula from left to right the exponent of *x* is increasing instead of decreasing, which makes sense because you can count up from zero but you can’t count down from infinity. These formulas look less random if we employ factorial notation, using “*n*!” (pronounced “*n* factorial”) to signify the product of the counting numbers from 1 to *n*. Then we can write

sin *x* = *x*^{1}/1!* *− *x*^{3}/3! + *x*^{5}/5! − *x*^{7}/7! + …

cos *x* = *x*^{0}/0! − *x*^{2}/2! + *x*^{4}/4! − x^{6}/6! + …

Here I’ve written the first term in the right hand side of the first equation as *x*^{1}/1!, and the first term in the right hand side of the second equation as *x*^{0}/0!, to better bring out the pattern that rules Madhava’s twin formulas.^{5}

One amazing feature of these formulas is that if you choose a value of *x* that’s far from zero, then each of Madhava’s two sums, after engaging in youthful excesses in which they trespass far out into the rightward and leftward expanses of the real line, settle down to become sober residents of the vicinity of 0, never again venturing to the right of +1 or to the left of −1. To see this dramatic phenomenon in action, suppose *x* is 10*π* radians, corresponding to an angle of 1800 degrees (five full turns). The terms of Madhava’s sum for the sine of 10*π* are initially medium-sized, starting with 31.4159, but quickly get large and oscillate wildly, jumping back and forth with alternating positive and negative steps of increasing size; the twelfth term of the sum is about negative one trillion, and the term after that is about positive two trillion. Yet eventually the terms start to get small, and astonishingly, the running sum of the terms gets closer and closer to zero. The positive and negative terms, no two alike and some of them upwards of a trillion, cancel out perfectly in the limit – as indeed Madhava’s formula says they must, since the sine of five full turns is precisely zero.

Sums like Madhava’s, involving unlimited powers of the independent variable *x*, are now called *power series*, and like polynomials, we can add or multiply them. Say we’ve got two power series

*a*_{0}*x*^{0} + * a*_{1}*x*^{1} + *a*_{2}*x*^{2} + …

and

*b*_{0}*x*^{0} + *b*_{1}*x*^{1} + *b*_{2}*x*^{2} + …

Then their sum is

(*a*_{0}+*b*_{0}) *x*^{0} + (*a*_{1}+*b*_{1}) *x*^{1} + (*a*_{2}+*b*_{2}) *x*^{2} +…

and their product^{6} is

(*a*_{0}*b*_{0}) *x*^{0} + (*a*_{0}*b*_{1} + *a*_{1}*b*_{0}) *x*^{1} + (*a*_{0}*b*_{2} + *a*_{1}*b*_{1} + *a*_{2}*b*_{0}) *x*^{2} + …

For a fun surprise, use power series multiplication to multiply

sin *x* = *x*^{1}/1!* *− *x*^{3}/3! + *x*^{5}/5! − *x*^{7}/7! + …

by itself, obtaining the first few terms of the power series representation of (sin *x*)^{2} . In a similar way find the power series representation of (cos *x*)^{2}. Now add those two power series. What do you notice? Should you have been surprised?^{7}

**HOW MADHAVA (PROBABLY) DID IT**

We don’t know exactly how Madhava came up with these formulas. But we have some intelligent guesses based on what his successors wrote. It’s likely that Madhava’s chief insight was essentially an application of differential calculus, by way of pondering the question “How do the sine and cosine of *x* change if you make a small change in *x*?”

What Madhava probably noticed by scrutinizing trig tables and then probably proved^{8} is that when you make a small change in *x*, the resulting change in the sine of *x* is proportional to the cosine of *x*, while the change in the cosine of *x* is proportional to the sine of *x* but with a minus sign in front.

This mutuality between sine and cosine – the way changes in the sine function are governed by the cosine function, and vice versa – might seem like a vicious circle (pun intended), but in fact this two-way relationship turned out to be not a cul-de-sac but a mighty rotary engine that enabled Madhava to churn out the respective power series for sine and cosine, with each turn of the crank spitting out a new coefficient.^{9} It doesn’t take anything away from Newton and Leibniz to acknowledge the Kerala mathematicians’ huge accomplishment: they came up with the idea of differentiating a function before anyone else did, and they applied the process *in reverse* to unify algebra and trigonometry.

For more on power series, read Steven Strogatz’s excellent article “How Infinite Series Reveal the Unity of Mathematics”. If you’re specifically interested in understanding the hidden geometric meaning lurking in the individual terms of Madhava’s formulas, check out Mathemaniac’s awesome video “The geometric interpretation of sin *x* = *x* − *x*^{3}/3! + *x*^{5}/5! − …”. A more traditional approach can be seen in Gary Rubinstein’s presentation “Madhava sine series derivation”. An overview of the Kerala school and its accomplishments appears in the chapter on Madhava in Ian Stewart’s 2017 book *Significant Figures*. And if you want to learn from a mathematical historian what we know *and* what we don’t know about the Kerala school’s version of calculus, see Victor Katz’s article “Ideas of Calculus in Islam and India”.

So, what did Newton and Leibniz know that Madhava and his successors didn’t? That’s a question for someone who knows more mathematical history than I do. (If you’re such a person, please post to the Comments!) But I’d wager that one thing Madhava didn’t know is that his ideas could be applied to so many problems in science, such as optics (think of Pierre Fermat’s work on refraction) or time-keeping (think of Christiaan Huygen’s work on pendulums), or for that matter astronomy: how the Indian astronomer would have loved Newtonian celestial mechanics! But Madhava and his successors discovered the major ideas of calculus, applied them to trigonometry, and then didn’t apply their revolutionary methods to anything else. It’s as if the ancient Siberians who crossed the Beringia land bridge twenty thousand years ago had invented bicycles to make the mass migration go more smoothly, and then, having safely arrived in Alaska (“Whew!”), put all the bikes in their closets and forgot about them, not realizing that bicycles might have other uses.^{10}

**ISAAC NEWTON (1643 – 1727)**

The binomial theorem that we teach in pre-calculus courses, expressing (*x* + *y*)^{n} in expanded form as a sum of *n*+1 terms, goes back at least to Bhaskara II’s 12th century treatise Lılavatı, and versions that give less explicit descriptions of the coefficients go back several centuries further.

What Isaac Newton did around 1665, and wrote up a few years later in a privately distributed monograph “On Analysis by Equations Unlimited in their Number of Terms”^{11}, was introduce a version that was at once more specialized (one of the summands is equal to 1) and much more general (the exponent can be any rational number). Here’s Newton’s binomial theorem:

with infinitely many terms on the right side of the equation. To get a sense of its power (pun intended), plug in *r* = 1/2, and recall (see the section from my essay “Denominators and Doppelgängers” called “The Principle of Permanence”) that the one-halfth power of a number is just a fancy name for the square root of that number:

(If you enjoyed squaring Madhava’s power series for the sine and cosine of *x*, try squaring Newton’s power series for the square root of 1+*x*. What do you get?) Plugging *x* = 1 into the formula, we get an equation whose right hand side is an infinite series whose partial sums get closer and closer to the square root of 2. And if instead of the square root of 2 you want the cube root of 2, Newton’s your guy: just replace the exponent 1/2 by the exponent 1/3 in his binomial theorem.

But these infinite series approximations to the square root and cube root of 2 don’t converge as quickly as we might hope, and that defect turns out to be a symptom of a much bigger problem: as soon as *x* becomes larger than 1, even an eensy-weensy bit larger, Newton’s infinite series for (1 + *x*)^{r} fails catastrophically. The terms just get bigger and bigger, and the approximation gets worse and worse, forever! It’s like what we saw for Madhava’s way of computing the sine of 10*π*, but without the “and then they grew up and settled down” ending.^{12} For more about Newton’s work on the binomial power series expansion (and the problem that led him to discover it), see another excellent Steven Strogatz article: “How Isaac Newton Discovered the Binomial Power Series”.

Newton’s formula never works if *x* is bigger than 1 or less than −1, and to get a feeling for why it fails when *x* strays too far from 0, it helps to look at the special case *r* = −1. Newton tells us that

or in other words

Replacing *x* by −*x* and swapping the two sides of the equation we get

This is a very old formula; the sum on the left is called an infinite geometric series (you’ve probably seen it with *x* replaced by *r*). Zeno of Elea appears to have known it in the case *x* = 1/2, and Archimedes used the case *x* = 1/4 in his computation of the area enclosed by a parabola and a straight line (as described in Strogatz’s book). This formula works for all *x* satisfying −1<*x*<1, but it fails when *x*=1 and when *x*=−1. (The partial sums in those two cases should remind you of the two rejectees at Club Cantor in last month’s essay: the partial sums of 1−1+1−1+… oscillate between 1 and 0, while the partial sums of 1+1+1+1+… just count up to infinity.) We call such infinite sums *divergent*, meaning that they fail to converge to a limit.

Things are even worse when *x* is greater than 1 or less than −1: consider 1+2+4+8+… and 1+(−2)+(4)+(−8)+…. The formula for the sum of an infinite geometric series, blindly applied with no attention to the fine print on the warning label (“Use only for *x* between −1 and 1″), suavely assures us that

and

The former sum features infinitely many positive numbers whose total is a negative number; the latter sum features infinitely many whole numbers whose total is a fraction.

Total nonsense! Or is it?

**LEONHARD EULER (1707 – 1783)**

Leonhard Euler, the greatest mathematician of the 18th century, was more tolerant of nonsense than your typical 21st century mathematician. The one-eyed genius had a many-eyed view of the mathematics of his day, including both exponential functions and trigonometric functions, and he’d devised a way to unify those two subjects via his iconic formula *e ^{ix}* = cos

The key caveat in the previous paragraph is “provided one could find the right rules for manipulating them”, and the key verb in that caveat is “manipulate”. The verb that’s *missing* is “interpret”. Euler didn’t look at a sum like 1+2+4+8+… and ask himself “What might that mean?” Instead, he asked, “What value does it *want* to have?” He worked hard at divining the secret desires of divergent series. In one case, he considered the extremely badly-behaved infinite series 0! − 1! + 2! − 3! + … from four different points of view, starting from the false assumption that the series converges (it doesn’t) and seeing if his bag of numerical tricks would let him estimate the value (it did). He showed that all four of his approaches led to approximately 0.6. For him, the confluence of the different methods indicated that he’d found the “correct” way to assign a value to this *prima facie* valueless expression.

In his 1760 publication *De seriebus divergentibus*, Euler practiced a kind of extension of the Principle of Permanence. A typical illustration of his philosophy can be seen in the context of the formula 1+*x*+*x*^{2}+*x*^{3} +… = 1/(1−*x*). Since the formula holds for lots of values of *x* (specifically all numbers *x* satisfying |*x*| < 1), isn’t it sensible to guess that the formula applies in some (possibly arcane) sense to *all* values of *x*? So for instance with *x* = −1 one would get 1+(−1)+1+(−1)+… = 1 = 1/2, though I would maintain that here “=” means something closer to “wants to equal”.^{14}

We can apply Euler’s approach to series of other kinds, not just geometric series. For instance, let’s square both sides of the formula 1+*x*+*x*^{2}+*x*^{3}+… = 1/(1−*x*). On the left we get 1+2*x*+3*x*^{2}+4*x*^{3}+… and on the right we get 1/(1−*x*)^{2}. (If you prefer, you can take Newton’s binomial theorem with exponent −2 and derive the same result.) Plugging in *x* = −1 we find that 1−2+3−4+… wants to equal 1/4. So, even though this infinite series diverges, there’s a sense in which you can associate the number 1/4 with it.

I promised I’d tell you about the seemingly similar sum 1+2+3+4+…, and I will, but I can’t yet, because Euler’s brand of wizardry doesn’t tell us how to associate a finite value with this expression. If you plug *x* = 1 into the equation 1+2*x*+3*x*^{2}+4*x*^{3}+… = 1/(1−*x*)^{2}, you get 1+2+3+4+… = 1/(0)^{2}, which is still infinite (or undefined). The wizard of Berlin^{15} was able to give finite values to 1−1+1−1+… and 1+2+4+8+… and 1−2+4−8+… and even 1−2+3−4+… but he wasn’t able to give 1+2+3+4+… a finite value of its own.

Before we meet a different wizard with a very different approach to 1+2+3+4+…, I pause to propose a puzzle: can you figure out, in the spirit of Euler, what the sum of the Fibonacci numbers wants to be? The Fibonacci sequence goes 1, 2, 3, 5, 8, 13, …, with each new term being the sum of the two before. Euler’s approach (codified later by the mathematician Niels Abel, and now known as Abel summation) would be to find an algebraic expression for the power series 1+2*x*+3*x*^{2}+5*x*^{3}+8x^{4}+13x^{5} +…, say of the form *p*(*x*)/*q*(*x*) for suitable polynomials *p*(*x*) and *q*(*x*), and then plug in *x* = 1. I’ll give you a big hint: try *q*(*x*) = 1−*x*−*x*^{2}.^{16}

**SRINIVASA RAMANUJAN (1887 – 1920)**

I’ve already told you the story of Ramanujan in a couple of my essays, so if you need a reminder (or if you’ve never heard his story), check out “Sri Ramanujan and the secrets of Lakshmi” and “The Man Who Knew Infinity: what the film will teach you (and what it won’t)”. Just over 110 years ago, on the 27th of February in the year 1913, Ramanujan wrote to his English colleague and future collaborator G. H. Hardy, saying:

*“Dear Sir, I am very much gratified on perusing your letter of the 8th February 1913. I was expecting a reply from you similar to the one which a Mathematics Professor at London wrote asking me to study carefully Bromwich’s Infinite Series and not fall into the pitfalls of divergent series. … I told him that the sum of an infinite number of terms of the series 1+2+3+4+… = −1/12 under my theory. If I tell you this you will at once point out to me the lunatic asylum as my goal.”*

Much as Euler had non-rigorously approached 0! − 1! + 2! − 3! + … in four different ways, Ramanujan proposed two different ways to assign a value to 1 + 2 + 3 + 4 + … and both approaches led to the value −1/12. Hardy was astonished. This unknown amateur mathematician in Madras was doing mathematics in the freewheeling style of an Euler, deriving results that went far beyond what Euler (and indeed beyond what Hardy and his contemporaries) had done!

This is not to say that Ramanujan was ignorant of Euler’s work; indeed, his approach to divergent series arose from a piece of mathematics (amazing in its own way) called the Euler-Maclaurin formula. But Ramanujan took this formula to places where no one had taken it before.

Hardy built a bridge between Ramanujan’s mathematics and mainstream number theory a few years later with his long-time collaborator J. E. Littlewood, when they laid the foundations of what has come to be called zeta-function regularization of divergent series. The roots of the method lie in other work of Euler, who had studied 1 + (1/2)^{s} + (1/3)^{s} + (1/4)^{s} + … for various integer values of *s*, memorably proving that when *s* = 2, the series sums to *π*^{2}/6. Bernhard Riemann later showed that allowing values of *s* in the complex plane yielded a function whose intricacies were intertwined with profound mysteries about prime numbers. Riemann’s zeta function remains an object of deep fascination, and the most celebrated open problem in mathematics, the Riemann Hypothesis, centers on it.

One often sees the zeta function defined by the formula ζ(*s*) = 1^{−s} + 2^{−s} + 3^{−s} + …, but the right hand defines ζ(*s*) only for complex values of *s* whose real part exceeds 1. However, there is a natural way to extend the domain of definition of ζ by way of a conceptual procedure called analytic continuation. Analytic continuation is sort of a super-powered version of Euler’s principle of domain-extension. Euler knew that there’s one and only one algebraic function that has the same value as the series 1 + *x* + *x*^{2} + … for all *x* between −1 and 1, namely, the algebraic function 1/(1−*x*). In an analogous but more difficult way, Riemann showed that there’s one and only one (jargon alert!) “meromorphic function”^{17} that assigns to each complex number *s* ≠ 1 a complex number ζ(*s*) subject to the requirement that ζ(*s*) = 1^{−s} + 2^{−s} + 3^{−s} + … for all *s* with real part exceeding 1.

Euler had shown that ζ(2) = *π*^{2}/6. Riemann showed (among many other things) that ζ(0) = −1/2 and ζ(−1) = −1/12,^{18} and the latter gave Hardy a way to understand what Ramanujan’s manipulations were doing. Just as Euler was led to equate 1 + 2 + 4 + … with −1 by plugging the off-label value *x*=2 into the formula 1 + *x* + *x*^{2} + … = 1/(1−*x*) (a formula valid only for *x* between −1 and 1), Hardy corroborated Ramanujan’s “lunacy” by plugging the illicit value *s*=−1 into the formula 1^{−s} + 2^{−s} + 3^{−s} + … = ζ(*s*) (a formula valid only for* s* with real part exceeding 1). With *s*=−1 the left hand side becomes 1^{1} + 2^{1} + 3^{1} + … which *doesn’t converge* to any real number, but the right hand side evaluates to −1/12, which convinced Hardy that the divergent series *wants* to converge to −1/12.

**TRUTH IN LABELLING**

Both Euler’s approach to 1+2+3+4+… and Ramanujan’s approach start from the idea that you shouldn’t think about that particular numerical expression in isolation; you should think of it as the value taken on by some function *f*(*x*) for some particular value of *x*. But *which* function? And *which* particular value? There’s the rub. Euler treated 1+2+3+4+… as the value associated with the function 1+2*x*+3*x*^{2}+4*x*^{3}+… at *x* = 1, and (after algebraic extrapolation) got the answer ∞. Ramanujan and Hardy and Littlewood treated 1+2+3+4+… as the value associated with the function 1^{x}+2^{x}+3^{x}+4^{x}+… at *x* = 1, and (after analytic continuation) got the answer −1/12.

More precisely, Hardy and Littlewood did that, inspired by Ramanujan’s work. Ramanujan himself didn’t mention the zeta function; he just did some manipulations of the series. But *which* manipulations are we allowed to do? There’s the (other) rub. There are ways to manipulate Ramanujan’s sum so as to lead to different conclusions than Ramanujan’s. For instance, there’s a way to “prove”^{19} that 1+2+3+4+… is 0, and there are *infinitely *many ways to “prove” that 1+2+3+4+… is −1/8. There are good reasons to prefer Ramanujan’s manipulations to these, but anyone who shows you Ramanujan’s derivation without explaining why his way of juggling symbols is profound and the others are mere curiosities isn’t telling you the whole story.

I think it’s misleading to say that 1+2+3+4+… *equals* −1/12. It would be better to say something more like “the divergent series 1+2+3+4+… is associated with the value −1/12” or “the zeta-regularized value of the series 1+2+3+4+… is −1/12” or “The Ramanujan constant of the series 1+2+3+4+… is −1/12.” Phrasing the result this way isn’t as catchy as asserting equality, but it’s more honest (while at the same time more respectable-sounding than “1+2+3+4+… wants to be −1/12”).

If you’re still inclined to buy the formula 1+2+3+4+… = −1/12, then it’s my duty to point out to you something else that you’re about to buy as part of the same deal. Remember ζ(0)? I told you earlier that it equals −1/2. So, taking the formula

and subtracting the equally valid formula

from it term by term, we get

But the left hand side of that last equation is Ramanujan’s 1+2+3+4+… with a 0 stuck at the front! What kind of number *x* has the property that 0+*x* is different from *x*? George Peacock, the originator of the Principle of Permanence, would have had a heart attack, and even Euler would have balked. Forget Kansas, Toto – we’re not even in Oz anymore!

I don’t want to come across as too critical of the irresponsible boggle-mongers who share tawdry mathematical factoids ripped from their proper contexts … even though I guess I did just call them “irresponsible boggle-mongers”. I mean, I know there are weirdness-junkies who want to have their minds blown by math and science on a regular basis, and there need to be dealers who provide those users with their fix. All I’m asking for is more truth in labelling, so that the people who are tripping on mathematical psychedelicacies don’t mistake what they’re consuming for actual food (just as responsible purveyors of hallucinogens try to keep their clients away from windows that could be mistaken for doors).

Heck, to show you that there are no hard feelings here, I’ll purvey some mathematical psychedelia of my own. Pssst, hey kid: didja know that the product of all the primes equals 4*π*^{2}? No lie … except for the word “equals”.

*Thanks to Jeremy Cote, Sandi Gubin, Joseph Malkevitch, Cris Moore, Henri Picciotto, Burkard Polster, Tzula Propp, and Glen Whitney.*

**ENDNOTES**

#1. If you want to generate a sine curve, you can roll your own: see the how-to guide Cut a Sine Wave with One Straight Cut. (Don’t flatten the roll when you cut it; that’ll give you a sawtooth wave instead of a sine wave!) For extra credit, make two cuts in the cardboard tube, one at each end, so that you get two sine waves. Can you arrange the cuts so that the two sine waves are in phase but have different amplitudes? Can you arrange the cuts so that the two sine waves have the same amplitude but are 90 degree out of phase? In principle you could “slice” one end of the cardboard using a hyperbolic paraboloid, obtaining a sine wave with twice the frequency of the sine wave obtained by slicing with a plane, but I have no idea how to do that in practice.

#2. Calculators don’t actually approximate trig functions using power series. As we’ll see shortly, when *x* gets too far from 0, the power series for sin *x* and cos *x* have enormous terms that wreak havoc with numerical approximations. But figuring out the power series expansions of sine and cosine was the first step that led to the internal machinations that you trigger in your calculator when you push the “sin *x*” and “cos *x*” button.

#3. In calling Madhava a “mathematician” I’m already guilty of imposing a 21st century perspective on an earlier era. Disciplines weren’t as compartmentalized then as they are now. He might well have considered astronomy to be his chief interest and mathematics a sideline.

#4. Actually, I’m lying, but just a little bit. If I’d been scrupulously honest, you’d see some some nuisance-factors cluttering up these equations; those nuisance-factors are powers of a single conversion ratio relating angles to lengths, and they disappear if you measure angles in radians rather than more familiar units like degrees. To keep my formulas uncluttered, I’ll use radians and omit the nuisance-factors.

#5. To see why it makes sense to define *x*^{0} as 1, notice that for all *n* > 0, *x ^{n}* equals

#6. The rule for the product of two power series might seem confusing at first, but it’s just like the rule for multiplying polynomials: when you have one sum times another sum and want to expand it as a sum of products rather than a product of sums, each individual term in the first sum must be multiplied by each individual term in the second sum. In particular, for all nonnegative integers *i* and *j*, we must multiply *a _{i} x^{i}* by

#7. A basic formula of trig that follows from the definitions of sine and cosine and from the Pythagorean theorem is that (sin *x*)^{2} + (cos *x*)^{2} = 1; so it’s not exactly earth-shattering news that when we add the squares of the power series representations of sine and cosine, obtaining *x*^{2} − *x*^{4}/3 + 2*x*^{6}/45 −… and 1 −* x*^{2} + *x*^{4}/3 − 2*x*^{6}/45 +… respectively, massive cancellation occurs and we get 1 + 0*x*^{2} + 0*x*^{4} +…. But it’s surprising to me that we can derive this deeply geometrical fact about sine and cosine using just algebra. Evidently those power series have lots of geometry hiding inside them!

#8. Using the trig formula sin(*x*+*y*) = (sin *x*) (cos *y*) + (cos *x*) (sin *y*) we can show that sin(*x*+*d*) − sin(*x*−*d*) = (2 sin *d*) (cos *x*); that is, as an angle increases from *x*−*d* to *x*+*d*, the sine of the angle increases by (2 sin *d*)(cos *x*). Here 2 sin *d* is the constant of proportionality, but its precise value is of secondary importance; what’s crucial is that the change in the value of the sine function is proportional to the cosine of *x*, and has the same sign. Likewise, as an angle increases from *x*−*d* to *x*+*d*, the cosine of the angle increases by (−2 sin *d*) sin *x*; the change is proportional to the sine of *x*, and has opposite sign.

#9. Here’s an historically inaccurate and incompletely rigorous but undeniably slick way to derive those power series *a la* Madhava, expressed in the language of modern calculus. Since cos *x* is an even function of *x* (that is, cos(−*x*) = cos(*x*) for all *x*) its power series has only even exponents of *x*:

cos *x* = *a*_{0} + *a*_{2}*x*^{2} + *a*_{4}*x*^{4} + ….

Similarly, sin *x* is an odd function of *x* (that is, sin(−*x*) = − sin(*x*) for all *x*) so its power series has only odd exponents of *x*:

sin *x* = *b*_{1}*x* + *b*_{3}*x*^{3} + *b*_{5}*x*^{5} + ….

Taking derivatives term by term we see that the derivative of cos *x* is

*d*(cos *x*)/*dx* = 2*a*_{2}*x*^{1} + 4*a*_{4}*x*^{3} + …

and that the derivative of sin *x* is

*d*(sin *x*)/*dx* = *b*_{1} + 3*b*_{3}*x*^{2} + 5*b*_{5}*x*^{4} + ….

Equating cos *x* with the derivative of sin *x* term by term we get *a*_{0} = *b*_{1}, *a*_{2} = 3*b*_{3}, *a*_{4} = 5*b*_{5}, …. Similarly equating sin *x* with the negative of the derivative of cos *x*, we get *b*_{1} = −2*a*_{2}, *b*_{3} = −4*a*_{4}, …. And we have one more equation as our linchpin: *a*_{0} = 1, corresponding to the fact that the cosine of 0 is 1. Turning these equations around and putting them all together, we get *b*_{1} = *a*_{0} = 1, *a*_{2} = −*b*_{1}/2 = −1/2, *b*_{3} = *a*_{2}/3 = −1/6,* a*_{4} = −*b*_{3}/4 = 1/24, *b*_{5} = *a*_{4}/5 = 1/120, …

#10. I recently read about a radiocarbon/macrofossil/microfossil/DNA study that challenges the standard Land Bridge Theory of how North America was populated; it seems that one particular long stretch of the Beringian causeway was too barren to have calorically sustained so many humans on foot. But I prefer to think that this study provides support for Bicycle Theory.

#11. For a more detailed version of this story, read chapter 7 of the book by Strogatz.

#12. Although we can’t get approximations to the square root of 3 by simply plugging *x* = 2 into Newton’s formula, we can still exploit the formula if we’re clever. Specifically, by setting *x* = −1/4 in Newton’s formula we can estimate the square root of 3/4, and then we can double our approximation of the square root of 3/4 to get an approximation of the square root of 3. Such tricks are built into what your calculator does when you press the square-root button, and they can also be useful for doing mental math. If you want to estimate the square root of some number *n*, and you happen to know a perfect square *m*^{2} near *n*, then write *n* as *m*^{2} times 1+*x* (where *x* can be positive or negative as long as it’s close to 0); then the square root of *n* equals the square root of *m*^{2}(1 + *x*), which is approximately *m* times 1 + x/2. So if *n* is (say) 10% bigger than *m*^{2}, the square root of *n* will be about 5% bigger than *m*.

#13. Was Euler’s hope fulfilled? I am tempted to say “no”, but it might be more prudent to say that the jury is still out. Mathematicians don’t have much use for divergent series these days, but physicists seem to have a higher tolerance for them. String theorists actively like them. (Of course they do.)

#14. One problem with allowing 1 + −1 + 1 + −1 + … into math is that it collides with the associative property. To see the issue, put parentheses into this sum in two different ways, first as (1+−1)+(1+−1)+… and then as 1+(−1+1)+(−1+1)+… The former way of putting in parentheses gives us the compacted sum 0 + 0 + … while the latter way of putting in parentheses gives us the compacted sum 1 + 0 + 0 + …. The first compacted sum evaluates to 0, while the second sum evaluates to 1, so we’re in trouble! Euler’s predecessor Guido Grandi noticed this paradox, though he found it consistent with prevailing ideas in religion and cosmology: “By putting parentheses into the expression 1−1+1−1+… in different ways, I can, if I want, obtain 0 or 1. But then the idea of the creation *ex nihilo* is perfectly plausible.” It should be noted, however, that from a purely mathematical perspective Grandi, like Euler, favored the value 1/2.

#15. Euler is often linked with his hometown, Basel, but he spent much of his career in St. Petersburg, and he resided in Berlin when he wrote his treatise on divergent series.

#16. Multiplying 1+2*x*+3*x*^{2}+5*x*^{3}+8*x*^{4}+13*x*^{5}+… by 1−*x*−*x*^{2}, we get massive cancellation, leaving 1+*x*+0*x*^{2}+0*x*^{3}+0*x*^{4}+ …. So our “Fibonacci power series” is associated with the function (1+*x*)/(1−*x*−*x*^{2}), and plugging in *x* = 1 yields −2 … though I don’t know what it means to say “The sum of the Fibonacci numbers is −2”.

#17. I won’t attempt to define “meromorphic”, but I’ll tell you what makes meromorphic functions magical. You know how each cell in your body (with the exception of a few in your naughty bits) contains your complete genome, and hence would allow someone to create an exact clone of you, at least in principle? Meromorphic functions are a lot like that. If you know the value of a meromorphic function *f*(*z*) for all complex numbers *z* in some tiny disk *D* in the complex plane, you can in principle reconstruct the value of *f*(*z*) for *all* values of *z*. It doesn’t matter how tiny the disk *D* is; if *f* is meromorphic, the way *f* behaves on *D* uniquely determines the way *f* behaves on the whole complex plane.

#18. Here I’m switching from explaining things that I understand from my own knowledge to reporting things that I know only second-hand; I’ve never read the proofs that ζ(0) = −1/2 and ζ(−1) = −1/12. If you want to delve more deeply into these matters, the Wikipedia pages on 1+1+1+1+… and 1+2+3+4+… listed at the end of the References would be a good place to start.

#19. If *S* = 1+2+3+… then 2*S* = 1+1+2+2+3+3+… = 1+(1+2)+(2+3)+… = 1+3+5+… = (1+2+3+…) − (2+4+6+…) = *S* − 2*S* = −*S*, and 2*S* = −*S* implies *S* = 0.

**REFERENCES**

E. Barbeau, Euler Subdues a Very Obstreperous Series, The American Mathematical Monthly, May, 1979, Vol. 86, No. 5, pp. 356–372.

BriTheMathGuy, I Found Out What Infinity Factorial Is.

E. Muñoz García, R. Pérez Marco, The Product Over All Primes is 4*π*^{2}, Communications in Mathematical Physics, volume 277, pages 69–81 (2008).

Victor J. Katz, Ideas of Calculus in Islam and India, Mathematics Magazine, Vol. 68, No. 3 (Jun., 1995), pp. 163–174. http://www.jstor.org/stable/2691411

MathOverflow, Do Abel summation and zeta summation always coincide?

Mathologer, Numberphile v. Math: the truth about 1 + 2 + 3 + … = −1/12

Numberphile: ASTOUNDING: 1+2+3+4+5+… =−1/12

Ian Stewart, Significant Figures: The Lives and Work of Great Mathematicians, Basic Books, 2017.

Steven Strogatz, Infinite Powers: How Calculus Reveals the Secrets of the Universe, Mariner Books, 2019.

Wikipedia, Madhava series.

Wikipedia, 1+1+1+1+….

Wikipedia, 1+2+3+4+….

]]>Mathematician Henri Poincaré once wrote “Mathematics is the art of giving the same name to different things,” and he wasn’t wrong, exactly. He was thinking about the way mathematics advances by generating new concepts that unify old ones. For instance, mathematicians noticed that adding 0 to a number, like multiplying a number by 1, doesn’t change the number you apply it to. Eventually they celebrated this resemblance between 0 and 1 by coming up with new vocabulary: nowadays we say that 0 and 1 are “identity elements” (the former for addition, the latter for multiplication).^{1} Two different things, same name.

But giving different things the same name is only half the story. Mathematics also invites us – and frequently *requires* us – to give different names to the same thing.^{2} Seventeen isn’t just 17. It’s also 10001_{two}. It’s the fraction 34/2 (or the mixed number 16 2/2, if we’re feeling goofy). It’s the real number 17.000… and the real number 16.999…. It’s the complex number 17 + 0*i*.

Actually, is 16 2/2 all that goofy? If I’m doubling 8 1/2, isn’t 16 2/2 a reasonable intermediate stage in the calculation? Carrying this idea further, we can conceive of calculation as the art of transforming names like 8 + 9 into names like 17. “17 × 23” may be a starting point for a school exercise, but it isn’t a question; it’s already an answer, or close to an answer. We just need to convert this number-name into a different sort of number-name that’s more useful for most purposes (though not for all purposes, since if our ultimate goal is to compute 17 × 23 + 17 × 77, it’s better to rewrite that expression as 17 × (23 + 77) = 17 × 100). The fundamental act of reckoning on a decimal abacus – trading ten beads in the ones column for one bead in the tens column – can be viewed as the act of trading one name for another. Adding fractions with different denominators requires us to rewrite one or both fractions, replacing a fraction by an equivalent fraction that names the same rational number. Allowing things to have more than one name is precisely what makes reckoning possible.

The example of adding fractions reminds us that having more Names than Things is useful when we’re building a new number system from an old one. We could just say that a fraction is specified by two whole numbers that have no factor in common (the numerator and the denominator), but it’s more convenient for both practical and theoretical purposes to allow rational numbers to have extra names in which the numerator and denominator have a common factor.

The modest price we pay for having more Names than Things is that we have to specify when two Names name the same Thing. For instance, in the case of fractions, we have a rule that says that the fractions *a*/*b* and *c*/*d* are different names for the same number when *ad *= *bc* (and that they represent different numbers otherwise). We can use this rule and the rules

to create a purely formal approach to fraction arithmetic whose rules teach us nothing about what fractions mean but tell us everything we could want to know about how to add, subtract, multiply, or divide fractions. William Rowan Hamilton, who went on to invent quaternions, was one of the first to approach fractions in this formal way.^{3} His point was not that this is a good way to think about fractions for practical applications; rather, he was demonstrating that you can create a new number system (the rational numbers) from a smaller number system (the integers) even before you have an interpretation for the new number system or know what it might be good for, just by specifying good rules.

What’s a good rule? Well, let’s look at a not-so-good one. If we define *a*/*b* ⊕ *c*/*d* = (*a* + *c*)/(*b* + *d*) (this is the mediant operation that I briefly mentioned last month), then 1/2 ⊕ 1/1 = 2/3 and 1/2 ⊕ 2/2 = 3/4. This is a problem: since 1/1 and 2/2 are two different names for the same number, we should get the same value for 1/2 ⊕ 1/1 and 1/2 ⊕ 2/2; yet we got 2/3 and 3/4, which are not two different names for the same number (check: 2 × 4 ≠ 3 × 3). A good rule is one that won’t lead to this kind of contradiction. Then again, maybe we shouldn’t one-sidedly put all the blame on ⊕; it might be better to say that there’s a mismatch between how ⊕ is defined and our rule for recognizing when two fractions name the same rational number.

Just as Hamilton built up the rational numbers from the integers, today I’ll show you how to build up the the real numbers from the rational numbers.

**THE PROBLEM**

Why aren’t mathematicians satisfied with the rational numbers? After all, rational numbers can be used to approximate any irrational number as closely as one might wish. And the rational numbers form a tidy system in which one can perform addition, subtraction, multiplication, and division on any two elements and get something sensible (as long as we don’t try to divide something by zero). So why use irrational numbers at all?

Here’s a paradox to explain part of what’s wrong with the rational numbers as far as coordinate geometry is concerned. Consider the circle of radius 1 centered at (0, 0). If you draw a line from (−1, −1) to (0, 0) and on to (1, 1), you start outside the circle, cross into the circle, and then cross back out of the circle. But where exactly do the two crossings take place? We can see them with our finite-resolution eyes when we look at our blobby sketches, but if we insist on infinite precision and we’re only allowed to use rational numbers, the crossing points are nowhere to be found. That’s because any intersection point (*x*,* y*) between the circle and the line would have to satisfy both *x*^{2} + *y*^{2} = 1 (the equation defining the circle) and *y* =* x* (the equation defining the line). That is, we’d need to have 2*x*^{2} = *x*^{2} + *x*^{2} = *x*^{2} + *y*^{2} = 1, so that *x*^{2} = 1/2. But there is no rational number *x* satisfying this equation; if there were, its reciprocal would be a rational number whose square was equal to 2, and we’ve already seen (back in my November 2022 essay “The Infinite Stairway”) that there is no such rational number. So in “rational geometry”, we find that a line can pass through the interior of a circle and out the other side without ever cutting the circle at a point! The Greeks would have been confounded by such a situation, since the basis of their geometry was drawing lines and circles and marking points of intersection. What’s going on is that if we use rational numbers as coordinates then the seemingly solid line and circle are riddled with gaps, as is the horizontal number line itself.

When we’re trapped in a number system that has gaps in it, we can’t use a famous result called the Intermediate Value Theorem. This theorem is a bedrock of the calculus, and is used to show that when we talk about the region bounded by a closed curve (for instance), the area of that region is a definite number whether or not we know how to compute it. In a more recreational vein, the Intermediate Value Theorem plays a key role in certain sorts of puzzles, such as the following: Show that, if you hike up a mountain on Monday and hike back down on Tuesday, there must be an instant on Tuesday when you’re at the exact same altitude that you were at exactly 24 hours earlier. The solution involves superimposing the graph that shows your altitude as a function of time on Monday with the graph that shows your altitude as a function of time on Tuesday; the puzzle is solved by observing that the graphs of the two functions (one increasing from some low number *a* to some high number *b* and the other decreasing from *b* to *a* during that same time interval) must cross. But if a line can pass into and out of a circle without piercing its skin, why must the two graphs cross? If our number system has gaps, the two graphed curves might pass through each other without crossing anywhere. This won’t do.

In 1872 mathematician Georg Cantor found a way to fill the gaps in the rational number line and to construct, not just some special irrational numbers like sqrt(2) and π, but *all of them at once*. He showed that the rational numbers already yearn to give birth to the irrational numbers by a process we call completion (metric completion, to be more specific). The roots of his construction go back thousands of years, when various cultures found systematic ways to generate better and better rational approximations to the square root of two. You saw some of these approximations if you read “The Infinite Stairway” though you might not have recognized the approximations as such. There I presented an ancient method of generating more and more solutions to the equation 2*a*^{2} − *b*^{2} = ±1. Since the numbers *a* and *b* quickly get large, in relative terms the pairs *a*, *b* satisfy 2*a*^{2} − *b*^{2} ≈ 0, or *b*^{2} ≈ 2*a*^{2}, or *b*^{2}/*a*^{2} ≈ 2, or (*b*/*a*)^{2} ≈ 2. Take the diagram from the section “Up the Down Staircase” of the November 2022 essay and flip it upside down, inserting some fraction lines:

You see the beginning of the infinite sequence of fractions 1/1, 3/2, 7/5, 17/12, 41/29, 99/70, …: an interleaving of the increasing sequence 1/1, 7/5, 41/29, … and the decreasing sequence 3/2, 17/12, 99/70, ….

Here’s what we see when we plot 1/1, 7/5, 41/29, and 239/169 (the first four terms of the increasing sequence) on the number line:

At first glance it looks like just three dots, not four, but if you look carefully you’ll see that the blue blob on the right is two dots mostly superimposed. And here’s what we see when we plot the next four terms of that sequence on the number line:

Again, that rightmost blob is really two dots ever-so-slightly displaced from one another. It certainly seems as if this increasing sequence is “trying” to converge to something on the number line that’s outside of the rationals, and we could define sqrt(2) as the Thing the sequence is “trying” to converge to.

That definition seems circular, doesn’t it? But we’ve seen similar mathematical hocus-pocus before. Consider: “3 ÷ 2” is the name of a division problem that we can’t solve in the counting numbers, so we created 3/2, essentially naming it after the problem it lets us solve. Likewise, “0 − 2” is the name of a subtraction problem that we can’t solve in the counting numbers, so we created −2, again naming it after the problem it lets us solve. Solving a problem by giving the problem a name and then proposing the name as a solution to the problem sounds like answering-a-question-with-a-question at best and circular-reasoning sophistry at worst. But we’ve seen that if we do it properly, it works, at least in math. So if we can’t find a limiting value of the sequence 1/1,7/5,41/29,… within the set of rational numbers, let’s define a new mathematical beast called lim(1/1, 7/5, 41/29, …) (where the dots represents all the terms of the sequence beyond 41/29) and give good rules for working with it.

Hence, our brave first draft for a model of the set of real numbers is Names of the form lim(*a*_{1}, *a*_{2}, *a*_{3}, *a*_{4}, …) where *a*_{1}, *a*_{2}, *a*_{3}, *a*_{4}, … is some arbitrary infinite sequence of rational numbers. Call this Draft #1. It’s a nice try, but it won’t do. Let’s see what’s wrong with it.

**CAUCHY TO THE RESCUE**

We want to help 1/1,7/5,41/29,… converge to something, but that doesn’t mean we want to help just *any* old sequence converge to a limit in our envisaged real number system. I mean, the sequence 1,2,3,4,5,6,… isn’t even trying to converge. And 1,0,1,0,1,0,…. is just messing with us.

What makes one feel that the sequence 1/1, 7/5, 41/29, … deserves to have a limit to converge to, but that the sequences 1, 2, 3, 4, … and 1, 0, 1, 0, … don’t?

French mathematician Augustin-Louis Cauchy had an answer to this question fifty years before Cantor asked it.^{4}

From some point onward, all the terms of the sequence 1/1, 7/5, 41/29, … differ from one another by less than 0.1. Also: from some point onward (further on in the sequence), all the terms of the sequence 1/1, 7/5, 41/29, … differ from one another by less than 0.01. In fact, no matter what tiny (but positive) constant *c* I pick, you can always find a place in the sequence with the property that all the terms to the right of the place you found differ from each other by less than *c*. This “bunching up” property is called the *Cauchy property*. The sequence 0.3, 0.33, 0.333, …, consisting of the successive decimal approximations to 1/3, has the Cauchy property. So does our sequence 1/1, 7/5, 41/29, … of increasing approximations to the square root of 2. So does the sequence 1, 1.4, 1.41, 1.414, … of increasing decimal approximations to the square root of 2. On the other hand, the sequences 1, 2, 3, 4, … and 1, 2, 1, 2, … don’t satisfy the Cauchy property. So let’s cut our universe of infinite sequences of rational numbers down to size, or at least make it a bit more manageable, by culling the sequences that don’t satisfy the Cauchy property. Or if “culling” sounds too ruthless, let’s imagine stationing a bouncer at the door of “Club Cantor” and instructing our bouncer to reject all sequences that don’t observe Cauchy’s rule of decorum.

We thus arrive at Draft #2 of the set of real numbers: all names of the form lim(*a*_{1}, *a*_{2}, *a*_{3}, …) where *a*_{1}, *a*_{2}, *a*_{3}, … is a sequence of rational numbers that satisfies the Cauchy property. (Mathematicians call such sequences “Cauchy sequences” for short.)

This proposal is also a failure, but for a totally different reason.

**GET OUT YOUR GLUE GUNS**

We can see the problem if we look at lim(1/1, 7/5, 41/29, …) and lim(3/2, 17/12, 99/70, …). The former sequence is increasing toward the gap in the number line where we’re going to construct sqrt(2) – the construction site, you might call it – while the latter sequence is decreasing toward that very same gap. Surely we want them to be the same real number! For that matter, lim(1, 1, 1, 1, …) and lim(.9, .99, .999, .9999, …) and lim(1.1, 1.01, 1.001, 1.0001, …) are different Names, so in Draft #2 of the real number system they are different Things. We’re having doppelgänger-management issues again, but in a much bigger way than last month: now every rational number has infinitely many doppelgängers, and so do all the gaps in the number line (the future homes of the irrational numbers) that we’re trying to fill!

Just as Hamilton (and others before him, of course) gave us a way to recognize when two different fractions represent the same rational number, we’ll need a way to recognize when two different sequences represent the same rational or irrational number. Hamilton said that *a*/*b* and *c*/*d* are two different names for the same thing when *ad* = *bc*; what’s the corresponding trick for recognizing when lim(*a*_{1}, *a*_{2}, *a*_{3}, …) and lim(*b*_{1}, *b*_{2}, *b*_{3}, …) are two different names for the same thing?

We can get a clue by considering the concrete case of 1/1, 7/5, 41/29, … and 3/2,17/12,99/70, …. Here’s what we get when we plot the first three terms of the increasing sequence in blue and the first three terms of the decreasing sequence in red:

The two sequences are getting closer and closer to each other. If we subtract the first sequence from the second sequence term-by-term, the differences 3/2 − 1/1, 17/12 − 7/5, 99/70 − 41/29, … get closer and closer to 0.

So now we can fix what was wrong with Draft #2: for any two Cauchy sequences *a*_{1}, *a*_{2}, *a*_{3}, … and *b*_{1}, *b*_{2}, *b*_{3}, …, decree that lim(* a_{1}*,

If decreeing that things that look different are to be regarded as the same seems too much like double-think to you, you might prefer the bag theory way to think about what’s going on. Bag theory was mostly the creation of Georg Cantor, except that he called it set theory (“Mengentheorie” in German), and it’s a good thing that he did, since otherwise mathematicians like his nemesis Leopold Kronecker might have dismissed his work even more than they actually did. Nobody really calls it bag theory, but it’s really about bags – not physical bags, but the kind that exist in your head. Sets are bags that can contain any objects of thought that you care to put into them, including other bags. Cantor’s bags have been just as consequential in math as paperclips and manilla folders were in the evolution of the early 20th century office. Paperclips and folders give you a way to stick things together whether they want to be stuck together or not, and Cantor’s bags do that for math. You want to make a bag that contains just the number 7, the number 24, and the number 365? Poof! No sooner have you thought of it than it exists. That is, it exists in your mind, because you put it there. And if you write ”{7,24,365}” on a blackboard in front of three friends, then poof-poof-poof! It exists in their minds too. In Draft #3, we put lim(* a_{1}*,

Notice that we wind up with fewer Things in Draft #3 than we had in Draft #2, but not because we’re throwing any of our Cauchy sequences away; rather, we are lumping many of them together into our bags. Culling and lumping are different processes, but they work in the same direction: making collections smaller and more manageable. Moving from the rationals to Cantor’s model of the reals was sort of a one-step-forward, two-steps back process, featuring a huge initial step forward (when we introduced infinite sequences of rational numbers) followed by two smaller steps back (first culling the non-Cauchy sequences, then lumping together those Cauchy sequences into bags). The end result was progress.

In Club Cantor terms, the bouncer will no longer admit individual sequences but will only admit complete gangs consisting of all the Cauchy sequences that have the same vibe, where two sequences are said to vibe together if the difference between the *n*th term of one and the *n*th term of the other goes to zero as *n* gets large. These gangs (like the bags) correspond precisely to the real numbers.

The square-root-of-two bag contains infinitely many names for sqrt(2). One of them is lim(1/1,3/2,7/5,17/12, 41/29,99/70,…). Each successive numerator and denominator are obtained from the previous numerator and denominator by following a simple recipe. What could be nicer?

**PREJUDICE**

Well, some people aren’t happy. They learned about the real numbers via their decimal expansions, and if you can’t show them a pattern in the decimal expansion of a number they think there’s something fishy about the number. “Irrational numbers don’t exist!” they cry. They forget that decimal expansions are only one way of understanding real numbers, and a recent one at that; the ancient Greeks knew a lot about rational and irrational numbers (or as they called them, ratios) with nothing like the modern decimal system. Just as it’s silly to say that negative numbers don’t exist because in some contexts (say, counting sheep) negative numbers don’t apply, it’s silly to say that the square root of two doesn’t exist because in some representations it looks random. Listen: I know a way of writing numbers in which the ordinary number 1/2 gets represented by .0100000100100101000…, an unending sequence of 0s and 1s in which no patterns have been found; does that mean that there’s something suspicious about the ontological status of the number 1/2? Surely the lack of pattern is best seen as a case of a mismatch between the number we are trying to express and the number-system in which we are trying to express it.^{5} Focusing on what numbers look like in base ten is a decimal-centric prejudice that a properly-tutored young mathematician should be steered away from as early as possible, certainly by the age of ten. When it comes to representing numbers, God gave us the unary system; all the rest is human contrivance for human convenience.

Perhaps some of the modern animus against irrational numbers stems from the fact that they outnumber the friendly, familiar rational numbers. You may think “outnumber” is an odd word to use here; aren’t there infinitely many of both, and doesn’t infinity equal infinity? The surprising answer is that sometimes, infinity *doesn’t* equal infinity, or rather (to put things in a less provocative way) there’s more than one size of infinity. Cantor was the one who taught us this, and I’ll talk about his infinities in a future essay. But one thing I’ll say now, which sort of hints at why “most” real numbers are irrational, is that if you generate a real number at random by using successive decimal digits chosen by throwing a 10-sided die, you’re virtually certain to generate an irrational number. That’s because the sequence of digits you generate is virtually certain to be an “infinite monkey sequence”, that is, a sequence that contains every possible finite sequence of digits. On the other hand, the digits of a rational number must eventually settle down into a pattern which they repeat forever after. It’s not hard to show that an eventually-repeating infinite sequence of digits can’t be an infinite monkey sequence.^{6}

**ARITHMETIC WITH REALS**

Okay, so we’ve constructed the real numbers, or something that we hope will behave like the real numbers; how do we know that we’ve succeeded? We claim to have constructed the square root of two as a bag marked “square root of two” on the outside, with infinitely many names of the form lim(*a*_{1}, *a*_{2}, *a*_{3}, ….) on the inside. But in what sense can we “square” this bag and obtain the bag marked “two” as the result?

Reach your hand into the bag and pull out some name lim(*a*_{1}, *a*_{2}, *a*_{3}, ….). The sequence is guaranteed to satisfy the Cauchy property. Moreover the sequence is going to bunch up around the same gap in the rational numbers where the sequence 1/1, 3/2, 7/5, 17/12, … does. “Square” the sequence *a*_{1}, *a*_{2}, *a*_{3}, … by squaring all the terms: *a*_{1}^{2}, *a*_{2}^{2}, *a*_{3}^{2}, …. It can be proved (with a bit of work or with a bit of cleverness) that this new sequence will satisfy the Cauchy property too, so the name lim(*a*_{1}^{2}, *a*_{2}^{2}, *a*_{3}^{2}, ….) must be in one of our bags somewhere. And which bag will that be? The bag marked “two”! For instance, if the name that we pulled out of the square-root-of-two bag was lim(1/1, 3/2, 7/5, 17/12, …) itself, then squaring all the terms gives 1/1, 9/4, 49/25, 289/144, …, and since this sequence converges to 2, the name lim(1/1, 9/4, 49/25, 289/144, …) must be in the bag marked “two”.

Here’s a general prescription for multiplying bags: To multiply two real numbers (call them *r* and *s*), reach into the “*r* bag” and pull out some name lim(*a*_{1}, *a*_{2}, *a*_{3}, …); likewise reach into the “*s* bag” and pull out some name lim(*b*_{1}, *b*_{2}, *b*_{3}, …); then form the infinite sequence *c*_{1}, *c*_{2}, *c*_{3}, … , with *c*_{1} = *a*_{1}*b*_{1}, *c*_{2} = *a*_{2}*b*_{2}, , *c*_{3} = *a*_{3}*b*_{3}, , etc.; and then find the bag that contains the name lim(*c*_{1}, *c*_{2}, *c*_{3}, …). The same prescription applies to adding two real numbers, except now we put *c*_{1} = *a*_{1}*+b*_{1}, *c*_{2} = *a*_{2}*+b*_{2}, *c*_{3} = *a*_{3}*+b*_{3}, etc.

I won’t take you through the details, but with this definition it’s easy to show that all the algebraic properties of the rational numbers carry over to the new number system we’ve built. For instance, we can use the fact that *a* + *b* = *b* + *a* for all rational numbers to prove that *r* + *s* = *s* + *r* for all real numbers. All the algebraic properties of our old number system are satisfied by the new number system. The principle of the permanence of form has been vindicated!

What’s more, we get a new non-algebraic property, the one we were hoping for: gap-free-ness, more properly called completeness. The real number system, unlike the rational number system, satisfies the Intermediate Value Theorem. So a line that passes into and out of a circle must cross the circle somewhere, and if you hike up a mountain one day and down the next, there must be a time of day when your altitude was the same on both days.

There’s only one more step, but in many ways it’s the most subtle one.

**ARROWS**

We’ve built a new number system, and it has lots of wonderful properties, but why do we call it an extension of the rational numbers? Isn’t it just some new number system, external to our old one?

It’s time for the final step in the process. We want to glue every rational number *a* to the corresponding bag containing lim(*a*, *a*, *a*, …), and we want to do this for every rational number. The rational numbers must step into the picture they’ve painted, becoming characters in the fictional world they helped create.

But if we do this with our mathematical glue gun, confusion ensues. We’ve defined real numbers as bags of infinite sequences of rational numbers, but if those rational numbers are bags too, then we have bags of bags of bags of …

A better approach is to use what in category theory are known formally as “morphisms” and informally as “arrows”. Arrows give us a way to say that two things are the same (in some ways) and yet not the same (in other ways). In this case, the arrow points from the set of rational numbers to the set of real-number bags, and associates with each rational number *a* to its avatar in the real number system: the bag that contains the name lim(*a*, *a*, *a*, …). So, even though we didn’t construct our real numbers as a superset of the rational numbers, this arrow lets us think of the rational numbers as a subset of the real numbers.

I want to point out how truly sneaky all this is. What rescued our solve-the-problem-by-naming-it tactic from utter sophistry was that the new number system we constructed – the one that’s essentially made up of the names of problems that we can’t solve in our old number system – was external to the old number system. But then we magically inserted the old number system into the new one.

Part of what category gives us is an appropriately relaxed attitude about what things “really are”. This can be especially useful if we want to consider Cantor’s approach to constructing the real numbers alongside a different construction that was proposed by Richard Dedekind at almost the exact same time. Dedekind’s idea was that if you want to specify a particular irrational number, which is to say, a particular gap in the rational number line, it’s enough to specify which rational numbers are to the left of the gap and which rational numbers are to its right, so why not just define the irrational number as that particular way of breaking the rational number line into a left piece and a right piece? These are the famous “Dedekind cuts” (though the core idea derives from Eudoxus two millennia earlier). This is a different definition of the real numbers, and you might worry that from different definitions, different consequences will follow. But there’s nothing to worry about. Each “Cantor real number” corresponds to one and only one “Dedekind real number”, so the two constructions are only different in their internal workings, not different in terms of how they interface with the rationals. Cantor and Dedekind didn’t construct two different number systems; they constructed the same number system in two different ways.

Category theory gives us a way to say that the two systems are the same without speaking nonsense. There are arrows between the Cantor reals and the Dedekind reals, giving us a weakened form of equality which is all we really need. Category theory provides a sophisticated framework for voicing your apathy toward questions like “Are real numbers really infinite decimals, or Dedekind cuts, or gangs of Cauchy sequences with a common vibe?” You get to say “Who cares? They’re all isomorphic anyway.”

(Decades ago, President Bill Clinton tried to evade accusations of dishonesty by saying that the interpretation of an assertion he’d made depended “on what the meaning of the word ‘is’ is.” This may have been lawyerly weaseling, but it sounds a little bit like mathematics. Was Clinton a category-theorist?)

Neither Cantor’s construction nor Dedekind’s plays much of a role in the day-to-day work of mathematicians who study real numbers. Like the plumbing in a house, the details of how real numbers work are normally hidden from view so we can focus on other things. If we need to dig into the infrastructure and wrestle with specific real numbers, we’re likely to use infinite decimals as our go-to model of what real numbers “are”.

**BUILDING NUMBER-UNIVERSES**

So why bother constructing the real numbers at all, if you’re never going to use the details? One answer has to do with the 19th century crisis of faith in the foundations of mathematics. Sure, you could just posit all the properties you think real numbers should satisfy as axioms, but how do you know your axioms don’t harbor some subtle self-contradiction? The real number system definitely seems weirder than the rational number system; how do we know it hangs together logically? The payoff of building up the reals from the rationals is that it provides a proof of relative consistency: as long as your axioms for the rational numbers are consistent, your axioms for the real numbers must be consistent too.

But the main reason I’ve spent so much time on building the reals as a completion of the rationals is that it’s a trick that can be used in other contexts. In particular, if we adopt a different notion of what we mean by words like “distance”, “limit” and “converge” in the context of the rational numbers, we can construct infinitely many new number systems (the *p*-adic number systems, where *p* is any prime you like). We’ll do that later, but we don’t have to wait till then before seeing Cantor’s ideas construct a novel number system. Let’s go back to Club Cantor and give the bouncer a new, more permissive admission criterion. It’ll still be Cauchy-ish, but instead of measuring the distance between two rational numbers in the ordinary way our kinder-gentler bouncer will measure the vertical distance between them in the following curiously warped number line:

This particular graph is the graph of the function *f *(*x*) = (sqrt(*x*^{2} + 1) − 1)/*x* (with *f*(0) = 0 by special stipulation), but the specific equation isn’t important; what matters is that it’s an increasing function with horizontal asymptotes *y* = 1 and *y* = −1. In this warping, the numbers 100 and 1000 are really close together (because the latter is only slightly higher than the former) and the numbers 1,000,000 and 1,000,000,000 are even closer together. Now sequences like 1, 2, 3, 4, … that go off to infinity are allowed into Club Cantor. Likewise 1, 10, 100, 1000, … and −1, −2, −3, −4, … and lots of other sequences that didn’t pass the Cauchy test before but do pass it now that we’re measuring distance in a different way.

The sequences that head off to the right like 1, 2, 3, 4, … get a new bag, which we can call +∞, and the sequences that head off to the left like −1, −2, −3, −4, … get a different new bag, which we can call −∞. Nothing else changes; all the other bags are as before. But now we get a number system called the extended real numbers. Want to know what +∞ times −∞ is in this new number system? Reach into the +∞ bag and the −∞ bag and pull out a sequence from each; multiplying term-by-term, you’ll find you get a sequence from the −∞ bag, so +∞ times −∞ is −∞. What about +∞ plus −∞? Now you’ll find that the answer is indeterminate; the term-by-term sum of the two sequences you pulled from the two bags might belong to either of the two bags, or to one of the bags labeled by a real number, or to none of the bags at all (since it might not satisfy the Cauchy property).

This last caveat hints at a detail I’ve swept under the rug (one of many such details): how do we know that in Cantor’s constructon of the ordinary real numbers, the sum or product of two sequences satisfying the Cauchy property must also satisfy the Cauchy property? In presenting Cantor’s construction as glibly as I’ve chosen to do, I run the risk of making it seem simpler than it actually is.

I referred to the extended real number system as “new”, meaning not yet described by me in this essay, but it wasn’t new even back in Cantor’s day; Euler and others had made extensive informal use of +∞ and −∞ as a kind of shorthand for describing the behavior of functions. But it’s nice to know that we can extend the real numbers in such a way that what was formerly informal becomes literally true.

(As we’ll see, Cantor came up with a new theory of infinity, but it was much more original and outrageous than the extended real number system!)

**MORE BIRTHDAYS**

Now that you know the theme of this essay, maybe you can guess why I started by wishing you a happy January 48th. January 48th is an extra name we might give to February 17th. Normally we don’t think of January as going past the 31st, but if we posit that January the (*n*+1)st should always refer to the day after January the *n*th, then January the 32nd should be another name for February 1, January the 33rd should be another name for February 2, and so on. To convert January dates beyond the 31st into proper dates, just subtract 31 and replace “January” by “February”. And if the number you got by subtracting is bigger than the number of days in February, keep going with more subtraction. On the other hand, January 0 is another name for December 31, and you can go deeper into negative January from there.

This kind of half-nonsense is actually useful when you’re doing mental calendar calculations, as intermediate steps in figuring out when something in the past happened or when something in the future is going to happen. Suppose you go into a shop today and buy something today that has a 30 day return policy. What’s the last day you can return it? Today is February 17th, so 30 days from now is February 47th; subtracting 28 (the number of days in February this year) we find that February 47th coincides with March 19th. (Note: I advise you to return your item no later than March 18th, since the shopkeeper may insist that February 17th counts as day 1, hence March 19th counts as day 31, which is one day too late. To read about a time this actually happened to me, see my essay “Impaled on a Fencepost”.)

A different application of allowing months to have supernumerary days is that it gives you a second chance (and a third, and maybe more) at celebrating golden birthdays. In case you don’t know, a golden birthday has traditionally been defined as the unique birthday in which the age you have just reached equals the day of the month on which you were born. So for example if you were born on the first day of the month, you had a golden birthday when you turned one. And that golden birthday – which I’m sure you don’t remember since you were only one at the time – is the only golden birthday you’ll ever have.

Or, it would have been like that, until now! (I think my proposal to redefine golden birthdays is original with me.) With supernumerary days, you’ll get a golden birthday every thirty year or so. For instance, if you were born on January 1st, you were also born on December 32nd, and November 62nd, and October 93rd, and you’ll get extra golden birthdays when you turn 32, 62, and 93.

So now you see a practical application of assigning different names to the same thing, and you don’t have to ask me “When will I ever be able to use this stuff?” My answer is: in thirty years or less. And when you prepare the cake and candles, don’t worry about how you’ll fit dozens of candles on your cake or stress about how much wax they’ll drip: even if you live to 127, seven candles will suffice. That’s because you can represent your age in binary instead of unary, using lit candles to represent 1s and unlit candles to represent 0s. Actually, Cantor (born on the 184th day of September) has a golden birthday coming up soon so when he turns 184, put an eighth candle on that cake. 184_{ten} can also be written as 10111000_{two}. A different name for the same number. Cantor would approve.

**ENDNOTES**

#1. The original term was German; the identity property of 1 was called “Einheit”, or one-ish-ness. Accordingly, the generic symbol for an identity element is the letter *e*.

#2. After defining Mathematics as the art of giving the same name to different things, Poincaré defined Poetry (by way of contrast) as the art of giving different names to the same thing. I think his dichotomy is unfair to both poetry and mathematics.

#3. Well, Hamilton was the first to treat fractions in a purely formal manner *on purpose*. One could credibly and depressingly argue that meaning-free manipulation of fractions has been taught and learned in classrooms for centuries.

#4. Cauchy was actually answering a different question when he formulated his convergence criterion in 1821; he just wanted a way of assessing whether or not a sequence converges if one doesn’t know in advance what the supposed limit is. Meanwhile, working on his own, Bernard Bolzano arrived at the same criterion in 1817 and had a clearer understanding than Cauchy of what the criterion was good for, so by rights it should be called the Bolzano property; but Bolzano did not publish his work and his advances did not come to light until long after his death.

#5. This system is credited to Alfréd Rényi, and is called the *β*-expansion of a number, where *β* can be any real number greater than 1. When *β* is ten, the *β*-expansion of a number is just its ordinary decimal expansion, and when *β* is a positive integer, the *β*-expansion of a number is its base-*β* expansion. Regardless of what *β* is, to find the *β*-expansion of a number *r*, subtract off the biggest power of *β* from *r* that you can, and then do the same to the remainder, and so on. For instance, to derive the *β*-expansion of 1/2 with *β* = 3/2, we subtract (3/2)^{−2} = 4/9 from 1/2, leaving us with 1/18; then we subtract (3/2)^{−8} from 1/18, leaving us with 217/13122; and so on. Recording which powers of 3/2 got subtracted gives us the *β*-expansion of 1/2: .0100000100100101000…. (See entry A360649 in the Online Encyclopedia of Integer Sequences for more terms.)

#6. Suppose the eventually-repeating sequence “clears its throat” for *M* digits and then launches into a never-ending repeating pattern whose length is *N* digits. Then a block of *M* + *N* consecutive 0s cannot occur in the infinite sequence unless the repeating pattern of length *N* consists entirely of 0s, in which case a block of *M* + *N* consecutive 1s cannot occur in the infinite sequence. Either way, we see that an infinite sequence that eventually repeats forever after cannot be an infinite monkey sequence.

It hapneth also some times, that the Quotient cannot be expressed by whole numbers, as 4 divided by 3 in this sort, whereby appeareth, that there will infinitly come from the 3 the rest of 1/3 and in such an accident you may come so neere as the thing requireth, omitting the remaynder…

— Simon Stevin, The Tenth (1585)Many people find fractions and decimals confusing, counter-intuitive, and even scary. Consider the story of the A&W restaurant chain’s ill-fated third-of-a-pound burger, introduced as a beefier rival of the McDonald’s quarter-pounder. Many customers were unhappy that A&W was charging more for a third of a pound of beef than McDonald’s charged for a quarter of a pound. And why shouldn’t they be unhappy? Three is less than four, so one-third is less than one-fourth, right?

Well, that’s what many of those aggrieved customers told the consultants who had been hired to find out why A&W’s “Third is the Word!” innovation had gone so disastrously awry. But I wonder if those customers were rationalizing (sorry…) after the fact. Maybe some of these people had had such bad experiences when learning about fractions in school (the awkward fraction 1/3 in particular) that they preferred to avoid eating at establishments that triggered their math anxiety.

Perhaps part of the problem is that for many people, the standard middle school curriculum on fractions and decimals doesn’t hang together well, with its mélange of different representations of things that they’re told are really the same thing under different names, such as 1 1/5 and 6/5 and 12/10 and 1.2 (and let’s not even mention 120%). And as if that weren’t bad enough, there are decimals that never end?!? It’s easy to come away from this experience confused and disheartened.

A common stumbling block, even before decimals come into the picture, is division of fractions. Ask a student “What’s 6 divided by 1/2?” and they’re likely to give the wrong answer 3 instead of the right answer 12. Part of what’s tripping them up is the way the phrase “divided by one-half” resembles the phrase “divided in half”, but a deeper issue is that the student often has no access to a mental model in which dividing one fraction by another is meaningful.

The education theorist Liping Ma opened my eyes to the complexities of teaching fractions in her book “Knowing and Teaching Elementary Mathematics”, which introduced me to different models of division. The *partitive* model of six-divided-by-two asks, “If you have six cookies to share among two children, how many cookies does each child get?” This model works well for 6 ÷ 2 but isn’t so helpful for making sense of an expression like 6 ÷ 1/2 in which the divisor (the number you’re dividing by, which in this case is the number of children) isn’t a whole number: how do you feed half a child?

The *quotative* model of six-divided-by-two asks, “If you have six cookies, and you want to share them among some children so that each child gets two cookies, how many children can you feed?” This model works well for 6 ÷ 1/2; if you have six cookies and you want each child to get half a cookie, then the number of children sharing the cookies should be twelve. But this model is less helpful when the quotient (the answer to the division problem, which in this case is the number of children) isn’t a whole number.

So how do we make sense of division of fractions when neither the divisor nor the quotient is a whole number, such as one-half divided by one-third?

Most students learn a procedure for dividing one fraction by another, handily summarized in the verse “Yours is not to reason why, just invert and multiply!”, where inverting a fraction means swapping the numerator and the denominator. The verse assures us that the scary expression *a*/*b* ÷ *c*/*d* equals the less-scary expression (*a*/*b*) × (*d*/*c*) , or (*a*×*d*)/(*b*×*c*). But if you just apply a memorized rule, you’re letting the rule (or the people who devised it) do the thinking for you.^{2} And then you risk becoming one of those people who thinks a third of a pound of beef should cost less than a quarter of a pound of beef.

(In a blog that focused more on real-world issues, an essay on fractions would treat specific forms of innumeracy related to fractions. The coronavirus pandemic showcased many examples of this, such as when people focused on case *counts* when they should instead have attended to case *rates*. Knowing when to use denominators, and just as crucially knowing what denominator to use, is a huge part of mathematical literacy, or as it’s come to be called, numeracy. But that’s not my beef today.)

A teacher explaining why one-half divided by one-third is one-and-a-half might make use of the quotative model: when you’ve got a girl and a boy who each want a third of a pizza (two slices) but you’ve only got half a pizza (three slices), if you give the girl her quota the boy will only get half of his. So in that sense there’s enough pizza for one-and-a-half children.

That’s a good approach – one that’s grounded in the kind of concrete situation that fractions were introduced to handle. But let’s see how a person of an oddly schematic cast of mind might approach the problem, not because of what this will tell us about fractions, but because of what it will tell us about mathematicians, and more specifically, about how mathematicians think when negotiating unfamiliar terrain – because we’ve got a lot of unfamiliar terrain coming up in future essays.

**THE AMNESIC MATHEMATICIAN**

Imagine an amnesic mathematician who’s forgotten how to divide fractions but remembers one important thing about dividing a whole number (call it *x*) by another whole number (call it *y*): for any (nonzero) whole number *m* (call it the scaling factor), the quotient (*m* × *x*) ÷ (*m* × *y*) is equal to the quotient *x* ÷ *y*. For instance, 60 ÷ 20 = (10×6) ÷ (10×2) = 6 ÷ 2. You might call this the scaling property of division. It can simplify division by letting us cancel common factors.

To apply the scaling property to 1/2 ÷ 1/3, we perform the scaling trick in the opposite direction: instead of scaling down two big whole numbers to get a simpler problem involving smaller whole numbers, we can scale up two fractions to get a simpler problem involving, not fractions, but whole numbers.

When *x* is 1/2 and *y* is 1/3 , the savvy choice of scaling factor turns out to be *m* = 6, so that *m* × *x* is 6 times 1/2, or 3, while *m* × *y* is 6 times 1/3, or 2: both whole numbers. Then *x* ÷ *y* = (*m*×*x*) ÷ (*m*×*y*) = 3 ÷ 2 = 3/2.^{3} (Here I’m skipping over some issues that a good teacher would have to address, such as the relationship between *x* ÷ *y* and *x*/*y* and *x* × 1/*y.*)

The scaling-property approach to dividing one fraction by another isn’t completely different from the model-based approach; in particular, scaling numbers up by a factor of 6 corresponds to telling a student “Try counting slices instead of pizzas.” But the Amnesic Mathematician is operating in a realm of pure number, with no kids or pizza in sight – just symbols (*x* and *y*) that are general in form though specific in their nature.

**THE PRINCIPLE OF PERMANENCE**

If that last phrase (“general in form though specific in their nature”) struck you as a little old-fashioned (in addition to being a bit obscure), that’s because I stole it from a book written over a century and a half ago: George Peacock’s “A Treatise of Algebra”. In it, Peacock enunciated a principle he called the Permanence of Form. The Principle says that when we’re trying to extend the operations of arithmetic from some number system (such as the counting numbers) to some larger number system (such as the fractions), we should assume that any algebraic formula that holds true in the smaller number system (such as (*m*×*x*)÷(*m*×*y*) = *x*÷*y*) will hold true in the larger number system as well. This principle isn’t a mathematical fact, and indeed it has many exceptions, of which the most historically important may be William Rowan Hamilton’s quaternionic number system (ironically, invented by Hamilton at about the same time as Peacock wrote his book): in inventing the quaternions, Hamilton had to ditch the commutative law of multiplication (*x*×*y* = *y*×*x*). When you apply Peacock’s principle, it’s important to be keep in mind that it’s not an infallible guide, but when it’s wrong, it’s wrong for an important reason, and the reason is worth understanding.

It turns out that what the Amnesic Mathematician did for 1/2 ÷ 1/3 (determining its value not by appealing to a model situation in which the division makes sense but by assuming that general properties of division of counting numbers will apply to fractions as well) can also be done for 1/2 + 1/3, 1/2 – 1/3, and 1/2 × 1/3, or indeed for the sum, difference, product, and quotient of any two (positive) fractions. Our Amnesic Mathematician can go on to prove that there’s one and only one way to extend the operations of addition, subtraction, multiplication, and division from the realm of counting numbers to the realm of fractions while preserving properties like (*m*×*x*)÷(*m*×*y*) = *x*÷*y* and *m*×(*x*+*y*) = *m*×*x*+*m*×*y*. And the resulting way of adding, subtracting, multiplying, and dividing fractions, although derived from purely formal considerations, turns out to be the right way to do arithmetic with fractions in contexts where those operations have meaning – even though our Amnesic Mathematician was not attending to meaning at all, but merely looking at formal properties of counting-number arithmetic and guessing that they extended to fractions.

Let’s apply the Principle to another problem: figuring out what 9^{1/2} should mean. We know that when the exponents *m* and *n* are counting numbers, 9^{m} times 9^{n} equals 9^{m+n}. Let’s make the brave guess that this equation is true even when *m* and *n* are fractions, and more specifically, when *m* and *n* equal 1/2. So 9^{1/2} times 9^{1/2} should equal 9^{1/2+1/2}, which equals 9^{1}, which is 9. This tells us that 9^{1/2} should be a number that, when squared, gives 9; that is, 9^{1/2} should be 3 (the square root of 9). The Principle of Permanence of Form predicts that in contexts where fractional exponents have some sort of meaning, the value of 9^{1/2} will turn out to be 3. That is, even before we know what the question “What is the value of 9^{1/2}?” *means*, the Principle gives us a way to divine the answer! This is magic of a high order.

There’s an odd dance between meaning and form. Form without meaning is incomplete, but even before meaning attaches itself to form, form can point the way towards meaning. That’s why mathematicians trust form even when meaning isn’t available; they stumble forward using form instead of meaning, hoping that their guesses are right and that the properties they are using (such as the scaling property for division) are valid in the new country they’re exploring. Sometimes those properties are too permissive to provide clear answers in the new country. Other times the properties are too stringent and admit no consistent answers. But every now and then, properties imported from the old country yield univocal, consistent answers in the new country. In that case, mathematicians treat this univocality as a sign that they’re on the right track.

**WHAT CHANGES**

An important property of the counting numbers that lies outside the purview of the Principle of Permanence of Form is the Archimedean property: Given two counting numbers *m* and *n*, no matter how disparate in size they are, if you add enough *m*’s together you can get a sum at least as big as *n*, and vice versa. The older I get, the more profound I think the Archimedean Property is, not just as a mathematical assertion but as an assertion about the observable universe. We study quarks and we study galaxies, and they’re very different from each other, but they occupy a common scale, with human beings somewhere in the middle. Maybe there are things that are infinitely smaller than quarks or infinitely larger than galaxies, but how could we ever come to know about them? It seems to me that the Archimedean property of the counting numbers in a way corresponds to fundamental limits on our ability to probe the universe with our finite bodies and minds.

It turns out that the Archimedean property persists when we include fractions: given two fractions *p*/*q* and *p*′/*q*′, adding *p*/*q* to itself *p*′×*q* times gives (*p*/*q*)×(*p*′*q*) = *p×p*′ which is at least as big as *p*′/*q*′ (because *p*, *q*, *p*′, *q*′ are all integers), and similarly when the two fractions reverse roles.

Another important fact about the set of counting numbers is that it is discrete. Putting it concretely, each counting number *n* has a successor *n*+1, and there are no other counting numbers in between (despite fanciful whimsies about “bleem” and “bleen” – see last month’s essay). So you might guess that the set of fractions is similar, albeit in a more compressed way with each fraction and its successor being much closer together. But you’d be very wrong.

Before we go on to talk about the strange topography of the set of fractions, let’s adopt the word mathematicians use to embrace both whole numbers and fractions: rational numbers. “Rational” just refers to a number that’s a ratio of integers (excluding division by zero, of course). Notice that all counting numbers are rational, since each counting number *n* can be written as the ratio (or fraction) *n*/1. I’m choosing to ignore negative fractions and zero for the time being, since humanity invented zero and negative numbers after fractions. So in this essay, when I talk about rational numbers, I’ll always mean positive rational numbers.

So now I can ask: What’s the first rational number that’s bigger than 1? Is it 101/100? No; 102/101 is smaller than 101/100 while still being bigger than 1. In fact, if you name any fraction *p*/*q* that’s bigger than 1, the fraction (*p*+1)/(*q*+1) is ever-so-slightly smaller while still being bigger than 1. So there’s no first rational number after 1. And 1 is not alone in this regard. Pick any rational number *p*/*q* that you like, and any slightly larger rational number *r*/*s*. *r*/*s* isn’t the smallest rational number that’s bigger than *p*/*q*; for instance, (*p*+*r*)/(*q*+*s*) comes in between.^{4} In fact, there are infinitely many rational numbers between *p*/*q* and *r*/*s*, no matter how close *p*/*q* and *r*/*s* are!

We summarize the situation by saying that the set of rational numbers is *dense*, which means that it’s infinite in a very strong way: every interval in the number line contains infinitely many rational numbers. The set of counting numbers is infinite too, but at least it has the decency to do its being-infinite thing out beyond the zillions, where we don’t have to look at it happening — whereas the set of rational numbers flaunts its infinitude right under our noses, everywhere we look.

**THE DECIMATION OF THE RATIONALS**

When it comes to how people treat rational numbers, I divide the modern world into three subcultures: Science, Math, and Real Life. In a table of physical constants (in Science), is the standard acceleration of free fall listed as 9 4/5 (or 49/5 or 98/10) meters per second squared? Never; it’s always listed as 9.8 meters per second squared (unless more precision is required). In a cookbook (in Real Life), would you see a recipe that calls for 1.5 cups of flour, or 3/2 of a cup of flour? Maybe your cookbook does, but in every cookbook I’ve ever seen, it’s 1 1/2 cups. And in a geometry textbook (in Math), would you see a formula that gives the area of a triangle of base *b* and height *h* as .5*bh*, or the volume of a ball of radius *r* as 1 1/3 *πr*^{3}, or worse, 1.3*πr*^{3}? (If you’ve forgotten, 3 is short for infinitely many 3’s.) No; it’d be 1/2 *bh* and 4/3 *πr*^{3}, respectively.

There are good reasons why these subcultures have adopted their respective conventions, and as long as we don’t get confused about which culture we’re in, all is fine. But trouble can arise when perfectly nice fractions get written as non-terminating repeating decimals; for instance, 1/17 is 0.0588235294117647 (with a block of 16 digits under the bar), while 1/2023 requires a block of 816 digits.

When you insist on limiting yourself to fractions whose denominators are powers of ten (or divisors of powers of ten), as is required by the decimal system popularized in Europe by Simon Stevin in the 1500s, you drastically cull the set of permitted fractions. Most rational numbers can’t be expressed precisely as decimal fractions but can only be approximated. 4/3 is close to 13/10 (aka 1.3), closer to 133/100 (aka 1.33), closer still to 1333/1000 (aka 1.333), etc. The good news is that when the number you’re approximating is rational, there’s always a pattern to the sequence of digits of ever-better, never-perfect approximations; if you’re patient enough, the pattern of the digits will repeat from some point onward.

So we write 4/3 as 1.3 or 1.333… as part of the decimal game. But when we do this, we’re changing the game; unlike a terminating decimal, which is a shorthand for a fraction whose denominator is a power of ten, a non-terminating decimal is a new kind of thing, and if we don’t say what “1.333…” is supposed to mean, then assertions involving that expression, like “1.333… > 1”, aren’t meaningful. It’s fine to say “the dot-dot-dot stands for infinitely many 3’s,” but that’s just restating, rather than answering, the question of what “1.333…” really means.

I’ll return to the question of meaning later. But for now, if we want to duck the question and just use Principle-of-Permanence magic to figure out what the value of 1.333… *should* be, we could posit that multiplying a non-terminating decimal by ten amounts to shifting the decimal point one place to the right (as is the case for terminating decimals); then *x* = 1.333… implies 10*x* = 13.333…; and if (invoking the Principle of Permanence again) we posit that subtraction for non-terminating decimals works like subtraction for terminating decimals, then subtracting the first equation from the second gives 9*x* = 12.000… = 12, or *x* = 12/9 = 4/3.^{5}

If you provisionally accept that 1.333… = 4/3, then you should accept that 0.333… = 1/3, as is taught in schools everywhere. I wonder: Has any student ever maintained that the fraction 1/3 doesn’t really exist, because you can’t finish writing down its decimal expansion? That would be historically perverse, since fractions predate decimals by thousands of years. It would also be “decimal-chauvinist”, since the fact that 1/5 has a terminating decimal expansion while 1/3 doesn’t is purely a result of the fact that the base we use, namely ten, is divisible by 5 but not 3; on a planet peopled by extra-fingered humanoids who use base twelve, 1/3 would have a terminating duodecimal expansion while 1/5 would not. Would it make sense to say that on our planet, 1/5 exists and 1/3 doesn’t, but that on their planet, 1/3 exists and 1/5 doesn’t?

(This may seem like a strange straw man for me to attack, but as we’ll see in my upcoming essay about real numbers, there are people who say things nearly as silly about the square root of two.)

A different obstacle to understanding 1/3 = 0.3 is a common confusion between the *process* represented by an expression and the *value* represented by that expression. As processes, the two sides of the equation are very different. But then again, so are the two sides of the equation 2×4 = 3+5, and even the two expressions 6÷3 and 4÷2 denote different processes. When we write 6÷3 = 4÷2, we aren’t asserting that the two division processes are the same; we’re saying that the two expressions 6÷3 and 4÷2 are different names for the same entity in the realm of natural numbers. Likewise, when we write 3/6 = 2/4, we’re saying that the two expressions are different names for the same entity in the realm of rational numbers. Different-looking fractions can represent the same rational number. It’s in that sense that 1/3 and 0.3 are asserted to represent the same rational number.

But now we come to a thornier issue than 1/3 = .333…. If we accept that equation as being both meaningful and true, then (invoking the Principle of Permanence again) we’d expect that we can triple both sides of the equation, obtaining 0.999… = 1, and that seems impossible. The numbers 0.9, 0.99, 0.999, etc. are all less than 1; how could 0.999… suddenly become equal to 1? Many students wonder about this. After all, a valid way to decide which of two terminating decimals is larger is to find the first digit at which they disagree; whichever decimal has the larger digit there is the larger number. Call this the first-discrepancy test. Some intuitive version of the Principle of Permanence makes students think that this first-discrepancy test should apply to non-terminating decimals as well.^{6} And that’s okay! As a teacher, I prefer prefer principled dissidence (“I don’t think .999… is equal to 1”) to muddled conformity (“I guess it means that the difference eventually becomes too small to matter”) or outright indifference (“Who cares?”).

I wrote about the mystery of .999… twice before, in my essays The one about .999… and More about .999…. (Third time’s the charm?)

**DOPPELGANGERS EVERYWHERE**

What if we rejected the consensus and tried to develop an alternative theory of decimals in which 0.999… was actually less than 1, so that the number 1 had an evil twin, a doppelgänger? After all, one of the themes of this blog is that math isn’t about following rules that other people decreed; it’s about following ideas (including ideas you dreamed up yourself) to wherever they lead us, and the only ironclad rule is that you have to accept the consequences of your choices.

So, what are the consequences of .999… < 1? Well, to start with, 1 isn’t the only number with a doppelgänger. We also have 1.999… (2’s evil twin) and 2.999… (3’s evil twin) and so on: infinitely many doppelgängers, one for each counting number!

But it’s worse than that. 1/2 has a doppelgänger too: .4999… And 17/20, which we’d normally write as 0.85, has 0.84999… as a doppelgänger. In fact, every rational number whose denominator is the product of a power of two (1 or 2 or 4 or 8 or …) and a power of five (1 or 5 or 25 or 125 or …) will give rise to a terminating decimal, which in turn will have a doppelgänger that you get by decreasing the last (nonzero) digit by 1 and sticking “999…” afterwards. Those doppelgängers won’t just be infinite in number on the number line as a whole: they’ll be dense. That is, they’ll infinitely infest every tiniest piece of the number line.

But it’s even worse than that. Peacock’s Principle of Permanence demands that you should be able to express 1.000… minus 0.999… in your system, and the result shouldn’t be 0 (since subtracting two unequal numbers can’t give zero). How will we represent that as a decimal? 0.000…1 perhaps? Likewise, if 0.999… is really less than 1, then there should be a number that’s halfway between 0.999… and 1.000…; would that be 0.999…5? But now we’ve changed the game in a serious way, allowing not just infinitely many digits after the decimal point but also digits that come *after* those infinitely many digits. How can something come after infinity?^{7}

I asked ChatGPT, the modern apotheosis of unjustified self-confidence, to prove that .999… is less than 1. Its reply began “Here is a proof that .999… is less than 1.” It then proceeded to show (using familiar arguments) that .999… is equal to 1, before majestically concluding “But our goal was to show that .999… is less than 1. Hence the proof is complete.” This reply, as an example of brazen mathematical *non sequitur*, can scarcely be improved upon.^{8}

**SOMETHING’S GOTTA GIVE**

We’ve seen that there’s a tension between different intuitions about how non-terminating decimals should behave. If we accept that the first-discrepancy test applies to non-terminating decimals, we’re led to believe that .999… is less than 1. On the other hand, if we accept that 10 times .999… is 9.999… and that 9.999… minus .999… is 9 and that the equation 9*x* = 9 has only the solution *x* = 1, we’re led to believe that .999. . . is equal to 1.

I do know of a number system in which .999… isn’t equal to 1.000…. It arises semi-naturally from sandpile models, and I haven’t written about it though I gave a talk about it once at a Gathering 4 Gardner convention. One big problem with this number system is that there’s no subtraction or division – just addition and multiplication. Peacock would be most displeased.^{9}

Even if you could find a number system in which .999… < 1.000… that’s better than the one I found – richer in the range of algebraic operations it permits, more endowed with the forms of structural beauty that mathematicians prize – applied mathematicians probably won’t care at all, and even most pure mathematicians would regard your system as a mere curiosity. That’s in part because your system would violate the Archimedean Property: the difference between .999…and 1.000…would be less than 1/10, less than 1/100, less than 1/1000, etc.; or, putting it differently, the difference between .999… and 1 would be so tiny that, no matter how big a power of 10 you multiplied it by, the product would still be less than 1. Losing the Archimedean property in exchange for enforcing the conviction that decimals that *look* different should *be* different would strike most mathematicians as a poor trade.^{10}

There are non-Archimedean number systems, to be sure, and we’ll meet a few in future months. But as we’ll see, the real number system, although equipped with far more numbers than the system of counting numbers or the system of rational numbers, maintains the Archimedean property. So a variant number system that extends the counting numbers but has .999… < 1.000… is going to be asked to prove what problems it solves better than the real number system does.

But still, what does “.999…” mean? It’s time to stop ducking the central question.

**OF UBS AND NUBS**

Regardless of what your pre-college teachers told you, your teachers’ teachers (in college math courses) almost certainly told them that .999… is defined through a “limiting process”. That is, “.999…” means “the limit approached by the infinite sequence .9, .99, .999, . . . ” Or perhaps they were told that “.999…” denotes the infinite sum 9/10+9/100+9/1000+…, where infinite sums, upon further discussion, turn out to be defined in terms of limits. What they meant is that the numbers 9/10, 99/100, 999/1000, etc. approach 1 in the limit as the number of 9’s goes to infinity.

But limits are a subtle concept (what does “going to infinity” even mean?), so this explanation sometimes lands on half-understanding ears. Some students, who don’t quite get it but who metacognitively *get* that they don’t get it, will add an additional layer of equivocation: instead of saying “The sequence approaches 1” or “The limit is 1,” they’ll use the mixed locution “The limit approaches 1” (which is kind of like saying “The name of Shakespeare’s last play is called The Tempest”).

Fortunately there’s an alternative way to express what mathematicians mean by .999… without recourse to the limit concept, using the notion of the *least upper bound* of a set of numbers. (Sometimes “least upper bound” is abbreviated as “lub”, though the abbreviation is still pronounced as “least upper bound”.) We cut the number line into two pieces. On the right are the rational numbers like 1 and 3/2 and 17 that are bigger than 9/10 *and* bigger than 99/100 *and* bigger than 999/1000 and so on – that is, rational numbers that are bigger than every fraction of the form (10* ^{n}*−1)/10

Anyway: If a number is an ub, every number to its right on the number line is an ub, while if a number is a nub, every number to its left on the number line is a nub. (That’s going to be a really confusing sentence in the audio version of this essay if I don’t pronounce it carefully, but never mind.) The division of the number line into ubs and nubs is thus an example of what mathematicians call a cut of the rationals into a “right half” and a “left half”.

Now comes the big question: What’s the boundary between the ubs and the nubs? The answer is 1: 1 is bigger than 9/10, 99/100, 999/1000, etc., but if *p*/*q* is any rational number that’s less than 1, there’s a number in the sequence 9/10, 99/100, 999/1000, …that’s bigger than *p*/*q*.^{11} So 1 is the smallest ub. That is, it’s the least upper bound.

And *that* is the way mathematicians make sense of the non-terminating decimal .999… : it’s the smallest number that’s bigger than all the approximations .9, .99, .999, . . .

Perhaps you feel that mathematicians are cheating: choosing the definition that leads them to the conclusion that they were hoping to arrive at. There’s a lot of truth to this. Mathematicians began writing things like 1/2 + 1/4 + 1/8 + … = 1 long before they developed the concept of limits or the concept of a least upper bound. Mathematicians of the 19th century developed these concepts to support the beautiful and powerful work on calculus that Newton and Leibniz and Euler and others had come up with more than a century earlier but which rested on shaky foundations. I admit it: we retconned^{12} the definition of infinite sums to support the conclusions of Newton, Leibniz, and Euler, and then for good measure we went further back in history and retconned non-terminating decimals. In our defense, we weren’t overturning the original intent of the inventors of decimal notation; they were practical people who focused on terminating decimals and weren’t too clear about what .333… meant. So 19th century mathematicians felt free to tell their predecessors “Here’s what you should have meant.”

If you want to develop a rival theory of non-terminating decimals and infinite sums and whatnot, the math police won’t break into your office and confiscate your notebook to stop you. But you’ll definitely have an easier time convincing others of the value of your theory if you’ve got something at least as good as the calculus of Newton and Leibniz as a spin-off. (Time travel, maybe?)

*Thanks to Richard Amster, Jeremy Cote, Sandi Gubin, Ben Orlin, Henri Picciotto, Evan Romer and Glen Whitney.*

**ENDNOTES**

#1. If any of you know of other early appearances of infinite decimals in mathematics, please let me know in the Comments!

#2. On the other hand, once you learn to think of division as the inverse of multiplication and multiplication as the inverse of division, then the problem of finding *a*/*b* divided by *c*/*d* is recast as the problem of solving the equation *x* × (*c*/*d*) = (*a*/*b*). If we multiply both sides by *d*/*c*, then the left side becomes *x* and the right side becomes (*a*/*b*) × (*d*/*c*). So now you can interpret the verse differently: what’s being inverted isn’t the fraction but the operation of division (whose inverse is multiplication). More importantly, you now have a better understanding of why the rule works.

#3. If asked to find a general formula for *a*/*b* ÷ *c*/*d*, the Amnesic Mathematician will proceed as follows: Scale up both fractions by *b*×*d* so that *a*/*b* becomes *a*×*d* and *c*/*d* becomes *c*×*b*; and then perform division on those two counting numbers, obtaining (*a*×*d*)/(*c*×*b*).

#4. The fraction (*p*+*r*)/(*q*+*s*) is called the *mediant* of the fractions *p*/*q* and *r*/*s*. Sometime soon I’ll write a Mathematical Enchantments essay about the mediant operation and how it relates to other pretty things like the (mis-named) Farey fractions, Ford circles, the Stern-Brocot tree, the Calkin-Wilf process, and the curious fact that, in a strange but mathematically well-defined sense, the average positive rational number is 3/2.

#5. Some students feel that 10 times 1.333… shouldn’t end in infinitely many 3’s; rather it should end in infinitely many 3’s followed by a 0. But what does that even mean?

#6. In some future month, when I introduce the nonstandard reals, you’ll learn about a context in which this intuition actually holds water, although not quite in the way those students imagine.

#7. If you’ve seen infinite ordinals, then you know that there’s a way to make sense of going “beyond infinity” (or at least going beyond the smallest infinity), so you might think that this provides a way to rescue the vision of an arithmetic in which 0.999… is less than 1. But this is starting to look awfully complicated…

#8. Curiously, shortly after ChatGPT gave me this answer, the chat session terminated unexpectedly, and when I started a new session and repeated my question, ChatGPT gave me a more sensible answer; no matter how strongly I prompted it, it wouldn’t repeat its earlier bogus answer. I know ChatGPT is just a predictive language model, but it was hard no avoid the sensation that this predictive language model was ashamed of its earlier response.

#9. This system, which others have discovered before, is just the formal arithmetic of decimal numerals in which carries (called “firings” in the world of sandpiles or “explosions” in James Tanton’s Exploding Dots pedagogy) take place as much as one needs, even out to infinity. If you add 0.999… to itself, with infinitely many firings (from left to right), you get 1.999…, which is also what you get from adding 1.000… to 0.999…. So in this number-system-wannabe, the equation *x* + 0.999… = 1.999… has two solutions, not one, and subtraction isn’t a well-defined operation anymore. Likewise the equation *x* × 0.999… = 0.999… has both 0.999… and 1.000… as solutions, so division isn’t well-defined either. The Principle of Permanence gets violated in a big way. This defective number system often crops up as a sort of way-station when people try to define the real number system in terms of decimal representations, as was done nearly half a century ago by Faltin, Metropolis, Ross, and Rota and was done more recently by Fardin and Li.

#10. For more about infinite decimals and the Archimedean property, you can watch a video of a talk I gave to middle schoolers at MathPath in the summer of 2022, or just check out the slides.

#11. If *p*/*q* is less than 1, *p* must be less than *q*, so *p* is at most *q*−1, so *p*/*q* is at most (*q*−1)/*q*, or 1−1/*q*. But now we can find a counting number *n* for which 10^{n} is bigger than *q*. Then 1/10^{n} is smaller than 1/*q* and 1−1/10^{n} is bigger than 1−1/*q*, That is, (10^{n}−1)/10^{n} is bigger than 1−1/*q*. But (10^{n}−1)/10^{n} is one of the fractions in our sequence 9/10, 99/100, 999/1000, . . . .

#12. If you’re unfamiliar with the concept of retroactive continuity, check out what Merriam-Webster has to say about it (or, if you dare, visit the TV Tropes page and risk being lured down the TV Tropes rabbit-hole). Someday I’ll write a Mathematical Enchantments essay called “Retroactive Continuity” about how the modern concept of continuity got retconned into the foundations of calculus.

]]>The notion of quantity shorn of context – that is, the advent of the concept of Number – was the greatest mathematical revolution of all time, the one that made all subsequent developments possible. I don’t have much to say about it because we know so little about it, but since most great advances involve trade-offs I want to mention two of the hazards made possible by the abstract number concept.

One hazard is logistical, and is exemplified by the famous failure of NASA’s Mars Climate Orbiter in 1999. A subcontractor submitted data in which force was measured in pounds rather than newtons, and NASA didn’t catch the mismatch. If the numbers stored in NASA’s computers had had units attached, the discrepancy would have been caught. But in twentieth-century data-processing^{2} the focus was on speedy operations using compressed representations of data, with all the fat (such as units of measurement) trimmed away. Programs were far removed from the meaning of the data being manipulated.

The other hazard is moral. Picture a harbormaster of the 1720s looking at a ledger he has just received from the captain of ship newly arrived from Africa. The ledger informs him that 213 units of cargo were lost at sea. He multiplies the number 213 by the typical price each unit of cargo would fetch at auction to compute the total loss of value of the ship’s contents. His arithmetic is flawless, unimpeachable. He does not pause to consider that the 213 units of cargo are enslaved people who died on the trip. The blankness of the numeral invites detachment from the reality it refers to. Numbers can numb us, even blind us.

Yet, offsetting the disadvantages of abstraction we have the universality of arithmetic. People have disagreed about many things (such as who made the world and why, and how those who inhabit that world are supposed to comport themselves while passing through) and have gone to war against those who disagreed with them, but propositions about arithmetic have not been the sort of thing that made people kill other people. (Well, there’s been a bit of a culture war about 2+2=4 lately, but it’s a war of words, not weapons, and 2+2=4 is really a proxy for other things.) Different cultures give different names to the counting numbers^{3}, but all of them agree about the facts governing those numbers. And one of the most curious facts is that there is *no last number*. You can run out of sheep or sacks or any other thing you care to count, and you can even run out of names for numbers and be forced to invent new ones, but you’ll never run out of numbers themselves. No matter how big a number you have, you can always add 1 to it, and that new number will be bigger. With this insight, we see a wondrous and somewhat terrifying vista opening up before us, a vista of numbers without end – an infinite stairway with a bottom but no top, vanishing into the clouds.

**HOW DO WE KNOW?**

It’s often a good idea to ask the question “How wrong might we be?” and the related question “Why are we so sure that we’re right?” Even in situations in which doubt seems a bit silly, employing our capacity for doubt is good exercise, and sometimes fun. The stairway of counting numbers is no exception. Consider the short story “The Secret Number” by Igor Teper, which invites us to believe that there is a counting number between three and four that *They* don’t want you to know about. In Teper’s story, published in 2000, the secret number is called “bleem”, but I remember hearing about a similar secret number called “bleen” back in the 1970s.^{4}

Wondering how wrong we might be was a serious matter for 19th century mathematicians. Reeling from the discovery of non-Euclidean geometries and other challenges to mathematical intuition, many mathematicians thought that, while it was all well and good to add new stories atop the edifice of mathematics, someone needed to go down below and make sure that the foundation was sound and that the basement wasn’t flooded.

One such mathematician was Giuseppe Peano (1858-1932). Although he’s remembered for his mathematics, he was also extremely interested in language;^{5} one of his pet projects was the development of an international language, specifically a kind of uninflected Latin. He thought that by reviving and reconfiguring the moribund lingua franca of the Renaissance, he could create a shared vehicle of expression for modern thinkers and writers. As it happens, his work on the foundations of arithmetic (in collaboration with Richard Dedekind and others) also involved a revival of something ancient: those tally marks we started with, or rather, the unary numeral system that underlies them.

In Peano’s system (the version that starts from zero rather than one) the numbers are named 0, S0, SS0, SSS0, etc., to be pronounced respectively as “zero”, “the successor of zero”, “the successor of the successor of zero”, “the successor of the successor of the successor of zero”, etc. (The names of numbers get tiresome very quickly!) Note that this system is in its essence a fancy version of using tally-marks; replace the S’s by notches and omit the trailing 0 and you’re back in the Paleolithic period.

If your goal is to work with numbers for practical, everyday purposes, such as ordering fish for a party of penguins, the unary representation notation is awkward to use.

But if like Peano you want to establish mathematics on a firm foundation, then the simplicity of unary makes it a winner. Consider for instance the task of adding numbers together. In the modern Hindu-Arabic system, to add two numerals you have to line them up and add digits systematically, using your short-term memory (or some external extension of your short term memory) to propagate carries appropriately. In contrast, to add two unary representations, you just write them side by side.

Well, Peano addition is actually more complicated than that, because the injunction “Write them side by side” is not expressed in the language of the Peano axioms. Peano’s prescription for adding numbers is a little more subtle; it’s a recursive^{6} procedure that expresses the answer to addition problems in terms of the answers to other addition problems. If you want to add two numbers in Peano’s system (call them M and N), there are two cases. If N is zero, then M+N is just M. Otherwise, if N isn’t zero, N has some S’s in it; in this case, M+N (the thing we want to compute) is just M+N′ with an S stuck in the front, where N′ is just N with an S removed.

The process is less confusing than it sounds; an example should clarify the definition. Let’s compute two plus two, or rather SS0 plus SS0. Putting M = SS0 and N = SS0, we find that we’re in the case where N isn’t zero, so the “Otherwise” clause tells us that SS0 + SS0 is S(SS0 + S0) (that is, S followed by whatever SS0 + S0 turns out to be). But what’s the value of the expression inside the parentheses? That is, what’s SS0 + S0 ? Putting M = SS0 and N = S0, we find that we’re again in the case where N isn’t zero, so the “Otherwise” clause tells us that SS0 + S0 is S(SS0 + 0). And now, putting M = SS0 and N = 0, we’re (at last) in the case where N is 0, so SS0 + 0 is SS0, so (heading back upstream) SS0 + S0 is S(SS0) = SSS0, and SS0 + SS0 (the thing we were originally trying to compute) is S(SSS0) = SSSS0, which is Peano’s name for four. And two plus two indeed equals four.

Likewise, one can give a procedure for multiplying Peano numerals: if N is zero, then M×N is just 0, while if N isn’t zero, M×N is M×N′ plus M, where N′ is N with an S removed as before.

In the context of Peano’s work, these two recursive algorithms shouldn’t be thought of merely as ways of performing calculations. Rather, these algorithms are to be treated as *definitions*. Much as Euclidean geometry starts with just a handful of basic notions like points and lines and builds up more complicated objects like trapezoids and angle bisectors in terms of the basic notions, Peano’s approach starts with only with the idea of succession, and a few axioms governing it, and builds everything up from there: addition, multiplication, subtraction, divisions, primes, you name it. (And along the way, we can easily banish bleem and bleen.)

**A TRIP TO THE BASEMENT**

I first encountered the (to my young self, audacious) idea of defining arithmetic purely in terms of the successor operation S in Kershner and Wilcox’s 1974 book *The Anatomy of Mathematics* (a reworking of Edmund Landau’s classic 1930 work *Grundlagen der Analysis*^{7}) when I was a teenager. I found it thrilling. In a way, reading the book was reminiscent of going down into the depths of the house I grew up in, back when I was a child. The basement was a mysterious place, and I had the impression that if you flipped the wrong switch you could make the house explode.^{8} The basement contained a boiler that gave us hot water, and a valve that controlled the flow of water throughout the house, and circuit breakers that controlled the electrical fixtures in the parts of the house where we spent most of our time. Going into the basement of mathematics, and seeing where all the valves and circuit breakers were, provided all the thrill of my childhood basement with none of the imagined danger. I’d long known that addition and multiplication were commutative (M+N = N+M and M×N = N×M for all counting numbers M and N), but after reading *The Anatomy of Mathematics* I knew how to *prove* it.

The book went on to build up the real number system in stages, but that’s not what I want to focus on today. Rather, I want to discuss the subtle way in which Kershner and Wilcox shifted my mental image of what a counting number was, from decimal numerals to something more abstract like “successor-of-successor-of-. . . -successor-of-zero”. The change didn’t happen on a conscious level, and given an arithmetic problem I’d still solve it the way I’d been taught to solve it in grade school (as opposed to, say, translating the problem from decimal to unary, solving it in unary, and then translating the answer back to decimal). But on some deep level, in reading the book I imbibed the Platonist Kool-Aid that whispered (if a beverage can be said to whisper) *Beneath it all, this is what counting numbers are*.

I was probably also affected by what Kershner and Wilcox didn’t do in their book. Specifically, they didn’t build a bridge between Peano’s way of writing counting numbers and the Hindu-Arabic decimal system I’d grown up with. (As the authors say in their Preface, “The reader does not even need to know the sum of 7 and 5; incidentally, if he does not know this sum he will not learn it from this book.”) The authors go so far as to prove the quotient-and-remainder theorem, which says that given positive integers N and D (for “divisor”) there exist unique non-negative integers Q (for “quotient”) and R (for “remainder”), with R between 0 and D−1, satisfying the relation N = D × Q + R. For instance, if N = 213 and D = 10, we get 213 = 21 × 10 + 3. Armed with the quotient-and-remainder theorem, Kershner and Wilcox could have recursively defined the decimal expansion of N as the decimal expansion of the quotient Q followed by the digit representing the remainder R, in the case where the divisor D is ten. But from their point of view, that would’ve been a cruel anticlimax − like bringing someone to a majestic castle, leading them up to a high balcony with an panoramic view of towering cliffs and a surging sea and then telling them “Did you know you can see your house from here?”

You might say that I’d been prepared for Kershner and Wilcox’s book by my early immersion in the pedagogical experiment known as the New Math. New Math had lots of problems in terms of both its design and implementation, but it worked very well for me (as one might have expected, given that it was largely designed by mathematicians). In particular, the New Math laid emphasis on the difference between numbers and numerals. Numerals were the things you wrote down; numbers were the things numerals denoted. What exactly was a number, then? New Math invited me to ask the question. *The Anatomy of Mathematics* answered it.

**GETTING THE PICTURE**

Peano proposed different versions of his axiom system during his lifetime, and the people who came after him introduced more. Some versions use just three axioms; others use a dozen. Some versions include axioms governing equality (e.g., “if M=N then N=M”); others presuppose them. Some versions incorporate addition and multiplication from the get-go and include axioms governing them, rather than building them up from the successor operation.

I’m going to join the ranks of Peano’s successors (sorry!) by presenting my own version, phrased in the language of combinatorics. A *directed graph* (sometimes called a *digraph* for short) is a bunch of nodes connected by arrows, like this:

The specific digraph we’re trying to characterize is the one you get if you create a node for each of the counting numbers, with an arrow pointing from each counting number to its successor.

You’ll notice that I’ve stripped off the names of the numbers; all that remains is the structure of succession, depicted as an abstract stairway.

Here are the axioms that single out this particular architecture:

Axiom 1: Every node has exactly one arrow pointing out of it.

Axiom 2: Every node has exactly one arrow pointing into it, except for one special node, which has no arrows pointing into it.

Axiom 3: If you color the nodes blue and red, where the special node from Axiom 2 is colored blue and at least one other node is colored red, then there must be an arrow from a blue node to a red node.

(This last axiom is equivalent to the axiom of mathematical induction. If it looks unfamiliar, here’s a paraphrase, a sort of contrapositive, that might be more reminiscent of the versions you’ve seen: If you color the nodes blue and red, where the special node from Axiom 2 is blue, but there is no arrow from a blue node to a red node, then all the nodes are blue.)

Once we’ve characterized the digraph with these axioms, we can put the labels back in if we like. That is, we can define 0 as the special node with no incoming arrows, define 1 as the target-endpoint of the unique arrow pointing out of 0, define 2 as the target-endpoint of the unique arrow pointing out of 1, etc.

Then we can define addition and multiplication as described above, prove the important properties of those operations, and if we like, do some calculations in the guise of proofs. Here for instance is Kershner and Wilcox’s proof that 2 times 2 is 4:

Notice that this proof uses just addition and multiplication. We’ve thrown away the scaffolding embodied in Peano’s successor-operation S. We’re back where we started, but also, in a way, somewhere else. We’ve arrived at a deeper understanding of what lies beneath the counting numbers.

**WHAT WE CAN KNOW**

The twentieth century weekly radio show “A Prairie Home Companion” had a recurring feature called “The news from Lake Woebegone”, in which host Garrison Keillor would describe fictional happenings during the past week in his iconic, nonexistent home town of Lake Woebegone, Minnesota. Each week he’d end the news segment with the same tag-line: “And that’s the news from Lake Woebegone, where all the women are strong, all the men are good-looking, and all the children are above average.” That last line gave humorous expression to the fact that most parents think their children are objectively special, and it even gave rise to a new bit of psychological jargon. But curiously, a version of the Lake Woebegone fallacy applies to the counting numbers, not as a fallacy but as a fact – specifically, the fact that every counting number is smaller than average.

When I say smaller, I don’t just mean smaller. I mean much, much smaller. Even a number that’s biggish by human standards, such as a trillion, is smaller than nearly every other number. Replacing a trillion by an even bigger number – go ahead, take your pick – will kick this paradox down the road (or rather up the staircase) but won’t abolish it. No matter what counting number N you pick, I claim that it’s atypical because there are infinitely many numbers that are bigger than N and only finitely many that are smaller. Am I saying that the “average” counting number is infinite? Sort of. But there are no infinite counting numbers. Every counting number is finite. The infinite stairway doesn’t have a top; it’s not that kind of stairway.

In the face of an imaginary structure of such daunting and paradoxical infinitude, you might think that human reason would be powerless. We can never explore more than the tiniest bit of the stairway, so how can we possibly make brash assertions about what is or isn’t to be found in its upper reaches?

Yet we can know some things about the stairway, and know them for certain. And the firm foundations provided by Peano and others give us the tools with which to do it.

For instance, we can know that no matter how high up the stairway we go, we’ll never find two consecutive counting numbers whose sum is even. To assert this so boldly is not hubris; our conclusion is just a consequence of the meaning of the word “even”, and the basic axioms and theorems governing the stairway. If we’re standing on the tread marked N, then the next tread is marked N+1, and the sum of these two numbers is N+(N+1), which can also be written as 2×N + 1; if you divide this sum by 2, you’ll get N with 1 left over.

Perhaps you feel that the mathematical universe is just a little bit emptier because of its lack of two consecutive counting numbers whose sum is even, and more broadly because some of the things we can imagine existing turn out to be impossible, but here is some consolation. We just saw that if you add together two consecutive counting numbers, the sum is *never* divisible by two. On the other hand, if you add together three consecutive counting numbers, the sum is *always* divisible by three. And if you add together four consecutive counting numbers, the sum is *never* divisible by four; while if you add together five consecutive counting numbers, the sum is *always* divisible by five. This alternating pattern continues forever: if K is even, the sum of K consecutive counting numbers is *never* divisible by K, while if K is odd, the sum of K consecutive counting numbers is *always* divisible by K. This beautiful numerical pattern (not hard to prove, by the way)^{9} grows from a soil compounded equally of existence and nonexistence, of *never*s and *always*es.

If math were the kind of game in which you could find counting numbers satisfying whatever combination of properties you like, then math would be poorer, not richer. Math – the kind I love, anyway – is about the beautiful patterns at the border between the possible and the impossible. If everything were possible, there’d be no contrast, no patterns, and no math. The phantoms of the stairway – the numbers that on shallow thought seem as if they might exist, but on deeper thought are seen to be self-inconsistent – live on as meaningful absences, subsumed into patterns that have a life of their own.

But also…

Some of the phantoms of the stairway *do* live; they just live elsewhere. The stairway is full of beauty and wonder but it isn’t the only place to be. There may not be a counting number N with the property that N plus N+1 is even, but there are rational numbers with that property. There may not be a rational number whose square is 2, but there are real numbers with this property. There may not be a real number whose square is −1, but there are imaginary numbers with this property. The study of number systems is the study of elsewheres, of places-to-be that resemble the infinite stairway while differing from it in important ways. And for the next couple of years, this blog will focus largely (though not exclusively) on some of those elsewheres.

*Thanks to Ori Gurel-Gorevich.*

**REFERENCES**

R. B. Kershner and L. R. Wilcox, The Anatomy of Mathematics.

Edmund Landau, Grundlagen der Analysis. A translation by F. Steinhardt is available on the web at https://salamon.sdsu.edu/Math534A/LandauReading.pdf.

**ENDNOTES**

#1. Not all archeologists agree that the Lebombo bone was used for counting; some consider the more recent Ishango bone to be the oldest surviving mathematical artifact. I personally want to believe that the Lebombo bone is indeed the sole surviving evidence of some ancient person’s pressing need to record the number twenty-nine. Do its twenty-nine notches count days in a lunar month? Some scholars think so. Perhaps women in that society used arithmetic to keep track of their menstrual cycles, maintaining control over their reproductive capacity until patriarchs arose and imposed a tally-ban.

#2. Newer programming languages, especially object-oriented ones, permit metadata alongside data, so perhaps newer computer programs won’t be subject to the sort of problem NASA fell victim to.

#3. One charming counting system is the “Yan, tan, tethera, methera, …” count used by shepherds in some parts of England. What I find most delightful about this system is the way it was used mostly, perhaps exclusively, for the counting of sheep. My guess is that if in the heyday of the system you had used it for some other purpose, say counting out coins, people would have thought you were being metaphorical for comic effect. Nowadays “yan, tan, tethera, methera, …” survives mostly among knitters counting stitches, and they are probably not thinking of stitches as individual sheep, since the amount of wool in a single stitch is closer to one microsheep.

#4. George Carlin apparently heard about bleen too, because in one of his routines he announced “The Nobel Prize in mathematics was awarded to a California professor who has discovered a new number! The number is bleen, which he claims belongs between 6 and 7.” If anyone can trace bleen/bleem back to the 1970s or earlier, please post to the Comments!

#5. Wikipedia calls Peano a glossologist rather than a linguist; can someone explain the difference to me?

#6. In my use of the word “recursive”, I’m hinting at the real technical crux of Peano’s system, which is the axiom of induction. Check out Kershner and Wilcox’s book if you want to see how the axiom of induction lets you devise and work with recursive definitions in Peano’s system.

#7. I love the paradoxical pair of admonitions Landau offers his student-readers in his preface: “Please forget what you have learned in school; you haven’t learned it.” “Please keep in mind everywhere the corresponding portions of your school work; you haven’t actually forgotten them.”

#8. I think in hindsight that my father must have said that the pipes could explode if I turned off the heat in winter when we left for vacations, but the main thing I took from his warning was the word “explode”.

#9. The sum of K consecutive counting numbers is equal to K times the average of the numbers; this product is a multiple of K precisely when the average is itself a counting number. If K is odd, then the average is just the number in the middle; but if K is even, then there are two numbers in the middle, so the average is the number halfway between those two consecutive counting numbers, which is not itself a counting number.

I’m sure you’ve counted (“One, two, three, . . . ”) on too many occasions to count. The process can be boring (counting sheep), exciting (counting your winnings at a casino), or menacing (“If you kids aren’t at the dinner table by the time I reach ten, I’ll …”). But one thing counting is *not* is liberating. What could be less free than the inexorable succession of the counting numbers? And yet the very regularity of counting numbers gives us the freedom to think about them in multiple ways, arriving at conclusions along delightfully varied paths.

Consider the classic problem of adding all the numbers from 1 up to 100. The obvious method of computing the sum takes a long time, which is why (according to a legend that may or may not be true) a certain schoolteacher in Germany a few centuries ago asked his students to work it out on their slates; he wanted to buy himself a bit of peace. Unfortunately for him, one of his students was the young Carl Friedrich Gauss, future doyen of European mathematics, who knew even then that when you add up a bunch of numbers, the order in which you add them doesn’t matter. That regularity gave Gauss the freedom to add them in a different order, peeling off the numbers from both ends of the list in alternation:

1+100+2+99+3+98+…+49+52+50+51

Pairing up the numbers two by two as

(1+100)+(2+99)+(3+98)+…+(49+52)+(50+51),

Gauss quickly saw that the answer was 101 + 101 + 101 + … + 101 + 101, or 101 × 50, and astonished his teacher by writing “5050” on his slate before even a minute had passed.^{1}

Gauss wasn’t the first person to figure out how to add the numbers from 1 up to *n*; the ancient Greeks (and probably other civilizations whose mathematical ideas weren’t as amply recorded or don’t get as much attention) knew that the sum is always half of the product of *n* and *n*+1. The way they proved it was by cutting an *n*-by-(*n*+1) array of dots into two triangles, as shown below for *n* = 10:

The triangular region to the left of the diagonal, read by rows from top to bottom, has 1+2+…+10 dots, and the triangular region to the right of the diagonal, read by rows from bottom to top, also has 1+2+…+10 dots. So, taking inventory of all the dots in the 10-by-11 rectangle, we see that twice 1+2+…+10 must equal 10 times 11, which implies that 1+2+…+10 must equal half of 10 times 11.

The same reasoning shows that for any counting number *n*, the sum 1+2+…+*n* must equal *n*(*n*+1)/2. For* any* counting number *n*. No matter how big! The fact that we can know this is a pretty amazing thing when you stop to think about it. There are bigger numbers than we can ever count to, bigger numbers than we could ever write down, bigger numbers than we will ever imagine with our finite brains – yet our argument shows that no matter how big *n* is, there’s a relationship between the value of *n* and the value of the sum of all the counting numbers up to *n*.^{2}

**MATHEMATICAL INDUCTION**

Let’s look at a different proof. It won’t give the same jolt of insight that you get from looking at the picture on the previous page, but the method scales up to tackle harder problems (like showing that 1^{4} + 2^{4} + . . . + *n*^{4} = *n*(*n*+1)(2*n*+1)(3*n*^{2}+3*n*−1)/30, say) in a way that the geometrical approach doesn’t.

Let’s start by examining the proposition 1+2+3+…+99+100 = 5050 and the proposition 1+2+3+…+99 = 4950. Please forget for a moment that you already know that the former proposition is true, because that will distract you from the subtler point I’m trying to make. Let’s give these propositions names. Let *P* be the proposition “1+2+3+…+99 = 4950” and *Q* be the proposition “1+2+3+…+99+100 = 5050”. (Note that *P* and *Q* are not numbers; they’re assertions of numerical equality.) I claim that the left-hand side of *Q* is 100 more than the left-hand side of *P* and that the right-hand side of *Q* is 100 more than the right-hand side of *P*. Check it out:

*P* : 1+2+3+…+99 =4950

*Q*: 1+2+3+…+99+100=5050

The left-hand side of *Q* is just like the left-hand side of *P*, except that there’s an extra 100; and the right-hand side of *Q* is 5050, which is 100 more than 4950. So *Q* is just *P* with 100 added to both sides, and *P* is just *Q* with 100 subtracted from both sides. *P* and *Q* either stand together or fall together. They’re either *both* true or *both* false.

You may wonder: Where am I going with this? All the way down to 1, is where! (That’s as low as we can go; I’m not considering zero to be a counting number.) For each counting number *n* between 1 and 100, let *P _{n}* be the proposition 1+2+. . .+

Have I proved *P*_{100} yet? Not quite. But I’ve shown that you that *P*_{100} and *P*_{99} are either both true or both false, and that *P*_{99} and *P*_{98} are either both true or both false, and so on, ending with the claim that *P*_{2} and *P*_{1} are either both true or both false. So it’s a package deal: you must either believe all one hundred of these assertions or disbelieve all hundred of them.

And now for the punchline: go back and look at *P*_{1} again. It asserts merely that 1 = (1)(2)/2, which I’m sure you believe. So you must buy the whole package, and assent to all one hundred of the propositions, including *P*_{100}, the one we were interested in to begin with.

Note that to make the argument work we didn’t actually need to know that for each *n* the propositions *P _{n}*

But why stop at 100? The same reasoning applies to larger numbers too. For every counting number *n*, if *P _{n}* is the proposition that 1+2+…+

Warning: Even though there are infinitely many counting numbers, you shouldn’t get the idea that infinity is itself a counting number. It isn’t. The infinite stairway has a bottom marked 1, but it doesn’t have a top marked ∞. Finite staircases always have a top tread, but our fictional infinite staircase doesn’t. Some find this wonderfully odd while others find it disturbing. Indeed, some mathematicians think that because the human mind is finite, we need a radically finite mathematics that banishes the infinite. These are the finitists, who want us to view the infinite stairway not as a completed thing (not even a fictional one) but as a blueprint for a structure that can never be completed. The radical wing of finitism is ultrafinitism, which asserts that really really really big counting numbers don’t exist.

My own prediction, based on what I know of mathematical history, is that mathematics, with its track record of expanding to accommodate different philosophies of mathematics, will eventually build a big enough tent to house the ultrafinitists. But I also predict that ultrafinitistic proofs will be more complicated than their infinitistic counterparts and will be very difficult to understand for those who lack a grounding in infinitistic mathematics. By way of analogy, consider the way we teach Newtonian physics as a prologue to Einsteinian physics; the former is just an approximation to the latter, but it’s hard to understand the truer relativistic theory without understanding its less-true non-relativistic precursor. So the role played by the infinite stairway in the philosophy of mathematics may change in the coming centuries, but it is not likely to be supplanted as a mental image for the working mathematician or for the student learning mathematics.

**GET OUT YOUR CRAYONS**

If all this talk about sums and propositions and truth seems too abstract and colorless, here’s a down-to-earth way to think about induction via a coloring game.

I write down the numbers from 1 to *n* in a row (in the picture I’ve chosen *n* = 10), I mark the number 1 with a smear of blue crayon and the number *n* with a smear of red crayon and then I hand the blue crayon to you.

After that, we’ll take turns making marks on not-yet-marked numbers, with you marking numbers blue and me marking numbers red. The game ends when there are two consecutive numbers (call them *k*−1 and *k*) with *k*−1 marked blue and *k* marked red (call this a “blue-red pair”); at that instant, whoever just moved (creating the blue-red pair) loses instantly. Blue-red pairs are forbidden but red-blue pairs are allowed; if *k*−1 is marked red and *k* is marked blue, play may continue.

Perhaps you’re wondering, what happens when there aren’t any not-yet-marked numbers and nobody’s lost the game yet? Perhaps you should try to construct a line of play that ends in a draw before you read further.

A famous theorem called Sperner’s Lemma tells us that a draw can’t happen. Specifically, the assertion that a draw can’t occur in our game is the 1-dimensional case of Sperner’s Lemma. (Sperner’s Lemma is usually discussed only in 2 dimensions and higher, where it’s far more interesting.) We can prove this by induction. We know that 1 is blue, so if 2 ever gets colored red, a blue-red pair is formed and somebody loses. So a draw can only happen if 2 is colored blue at the end of the game. What about 3? The same reasoning applies: we’ve shown that if there’s a draw, 2 must be colored blue at the end of the game, and is 3 is colored red, then a blue-red pair is present and the game wasn’t a draw after all. And so on. Ultimately, we reach *n*−1, and show that it too must eventually be colored blue. But at the instant that that happens, *n*−1 and *n* will form a blue-red pair, and somebody (namely the blue player, namely you) loses. So a draw is impossible.

Notice that we can state this result in a less combative way: regardless of whether the players compete or collaborate, there’s no way to color the numbers 1 through *n* so as to simultaneously satisfy the constraints (a) 1 is blue, (b) *n* is red, and (c) there are no blue-red pairs. The three conditions are incompatible.^{5}

The reason I’ve taken this detour is that the fact that we just learned about the Sperner game (to wit, that conditions (a), (b), and (c) are incompatible) isn’t just an application of induction; you can turn things around and use the incompatibility result to *prove *the principle of mathematical induction!

Suppose we have some propositions *P*_{1}, *P*_{2}, *P*_{3}, … and we’d like to prove that they’re all true. Furthermore, suppose that *P*_{1} is true, and suppose that whenever *P _{k}*

[Hey readers: Did you like this section? It’s a bit of an unusual take on induction. Was it helpful or was it distracting? Let me know in the Comments!]

**BANISHING PHANTOMS**

Mathematical induction is great for proving that certain things always happen, but it can also be used to show that certain things *never* happen. (This shouldn’t be surprising, though, since Never is just another kind of Always.)

Say you want to draw a regular octagon on graph paper, like this:

This first effort isn’t bad, but it’s fairly evident that the horizontal and vertical sides are slightly longer than the diagonal sides. Can we do better? For that matter, why settle for merely *better*: can we do it *perfectly*?

What we are looking for are numbers *a* and *b* to replace 2 and 3 in the picture that will give us a regular octagon. That is, we want whole numbers *a* and *b* with the property that the hypotenuse of an isosceles right triangle with both its legs of length *a* has length equal to *b*. By the Pythagorean theorem, this is equivalent to the equation 2*a*^{2} = *b*^{2}. In other words, we are looking for a perfect square (*a*^{2}) which when doubled (2*a*^{2}) equals another perfect square (*b*^{2}).

So, you charge up the infinite stairway, looking for a counting number *a* with the property that 2*a*^{2} is a perfect square. Surely you’ll find one; in an infinite universe (so goes the cliché) everything you can imagine is bound to happen somewhere eventually, and the stairway is infinite, so surely you’ll eventually find the object of your quest!

Onward, past one million. You haven’t found such an *a* yet, but remember, success favors the bold, not the quitter. Onward, past one billion. Don’t give up now! You’ve invested so much in the search; why throw in the towel and throw all that effort away? Onward, past one trillion. Ignore all the nay-sayers, including the ones in your own head. Believe in yourself! Keep going! …

Well, no — please don’t. You are chasing a phantom; no such number *a* exists. One of the most amazing things about the stairway is that it’s possible for us to know, beyond doubt, that certain number-properties that we can formulate (such as the property “2*a*^{2} is a perfect square”) are not satisfied by any counting number *a* whatsoever. We don’t prove this by conducting an exhaustive survey of the stairway, which the human mind can’t do. Instead, we make use of a curious asymmetry of the stairway: you can ascend forever and never hit an obstacle, but any downward trip in the stairway must eventually end.^{6}

I wrote about the method of proof by descent in Reasoning and Reckoning so I won’t give the full details in this essay. But I’ll summarize here one of the conclusions I established there, which is that if *a* is a counting number with the property that 2*a*^{2} is a perfect square, then *a*/5 (call it *a*′) is a smaller counting number with the property that 2*a*′^{2} is a perfect square. Applying this same argument a second time, we find that *a*′/5 (call it *a*′′) is an even smaller counting number with the property that 2*a*′′^{2} is a perfect square. And so on, ad infinitum. We get an infinite sequence of ever-smaller counting numbers *a*, *a*′, *a*′′, … , each with the property that its square, doubled, is a perfect square. But wait a minute: how can we have an unending sequence of counting numbers, each smaller than the one before? There’s no such thing! So we conclude that there’s no such number *a*. It was never more than a phantom.

Here’s a more geometrical way to banish the phantom from the infinite stairway, discovered by Joel Hamkins. Draw an octagon with corners labeled *A* through *H*:

We can draw a new octagon by swinging the edges through 90 degrees about an endpoint. For instance, we swing edge *AB* 90 degree clockwise about *A*, and call the new point *A*′; we swing edge *BC* 90 degree clockwise about *B*, and call the new point *B*′; and so on, around the octagon.

Here are the two things to notice: (a) if *A* through *H* are grid-points, then *A*′ through *H*′ must be grid-points as well; and (b) if *AB*···*H* is a regular octagon, then *A*′*B*′···*H*′ is a regular octagon. So if our original octagon had both properties, we can repeat the process as many times as we like, obtaining ever-smaller regular octagons with corners in the square grid. But all these octagons have side-lengths equal to counting numbers, so we get a sequence of ever-smaller counting numbers, which we know is impossible.

This method of argument was a favorite of Pierre Fermat’s. He used it for instance to prove the *n*=4 case of what is now called Fermat’s Last Theorem. Specifically, he showed that there don’t exist positive integers *x*, *y*, *z* satisfying *x*^{4} + *y*^{4} = *z*^{4}. In fact, he used the method of descent to prove something stronger: there don’t exist positive integers *x*, *y*, *z* satisfying *x*^{4} + *y*^{4} = *z*^{2}. He showed that if there were a phantom solution, there’d be a smaller phantom solution, and a smaller phantom solution, and so on, ad infinitum, which is impossible, since there cannot be an infinite sequence of ever-smaller counting numbers.

So you could say that the way to banish phantoms from the infinite stairway is to kick them down the stairs!

**UP THE DOWN STAIRCASE**

If the preceding results strike you as having a depressing vibe (“no way”; “can’t be done”; “impossible”; “don’t waste your time trying”), you’ll be glad to learn that the downward impossibility results can sometimes be flipped into upward *possibility* results.

Let’s look again at the picture of the two nested octagons and follow the action more carefully. The big octagon is determined by the two numbers 5 and 7 (the horizontal displacement from *A* to *B* is 7 and the horizontal displacement from *B* to *C* is 5), and the reason the octagon is so close to regular is that twice 5^{2} is very close to 7^{2}. Likewise, the small octagon is determined by the two numbers 2 and 3 (the horizontal displacement from *A*′ to *B*′ is 3 and the horizontal displacement from *B*′ to *C*′ is 2), and the reason the octagon is somewhat close to regular is that twice 2^{2} is somewhat close to 3^{2} . The recipe for getting from *a* = 5, *b* = 7 to *a*′ = 2, *b*′ = 3 is to take *a*′ = *b*−*a*, *b*′ = 2*a*−*b*.

But this method of descent, as I promised you, has a flip side: a method of ascent that let us create infinitely many near-misses that come close to solving the original problem, and lets us systematically create ever-better grid-approximations to a regular octagon. It’s the descent process, run in reverse: *a* = *a*′+*b*′, *b* = 2*a*′+*b′*. Or if we prefer, *a* = *b*′+*a*′, *b* = *a*′+*a*. Here’s a graphical depiction of the process:

The picture is made of little number-snippets (1,1,2,3; 3,2,5,7; 7,5,12,17; 17,12,29,41, etc.) arranged in three-quarters circles. In each snippet, the third number is the sum of the first and second, and the fourth number is the sum of the second and third. If we wanted to continue the pattern, we’d have 41+29=70 at the top right and 29+70=99 beneath it. We get infinitely many pairs *a*, *b* satisfying 2*a*^{2} − *b*^{2} = ±1. Although the discrepancy between 2*a*^{2} and *b*^{2} never goes below 1 in absolute magnitude, in relative magnitude (compared to *a* and *b*, which keep growing exponentially) the discrepancy is getting exponentially smaller.

The Indian mathematician Brahmagupta knew all this, and he came up with an even faster method of getting really good approximations by combining two known approximations. Specifically, he discovered that if *u*^{2}−2*v*^{2} = ±1 and *w*^{2}−2*x*^{2} = ±1, then putting* y* = *uw*+2*vx* and *z *= *ux*+*vw* we get a new solution *y*^{2} − 2*z*^{2} = ±1. In a later essay, I’ll explain why this seemingly miraculous way of building new solutions from old is perfectly sensible and unsurprising when viewed through the lens of algebraic number theory.

**THE BIGGEST MYSTERY**

There are many odd features of this imaginary stairway, equipped with a bottom but no top. One is the futility of attempting to climb it. There may be an illusion of progress, but no matter how many treads we surmount in absolute terms, we have made, in relative terms, no progress at all. For no matter how far we’ve come, the part of the stairway that we have passed is finite, while the part that remains before us is infinite; compared to what lies ahead, what lies behind is negligible. You’re always just beginning your journey; in relative terms, you never get off that first tread.

And yet we can *know* things about the number-stairway, and know them with certainty, even facts that pertain to parts of the stairway we’ll never visit. We can say, for instance, that the far reaches of the stairway contain infinitely many numbers *a* for which 2*a*^{2} − 1 is a perfect square, yet none for which 2*a*^{2} is a perfect square. The way in which the infinite stairway combines knowability and unknowability is part of its allure.

But for me, the most wondrous thing is the way the rigidly one-dimensional stairway, refracted through the human mind, kaleidoscopically unfolds into something more like a landscape than a corridor. This is a landscape not of numbers but of knowledge. The facts of math are not arranged in a line, but rather lie scattered about, and we must arrange them into patterns and then organize the patterns into some sort of story. Facts fall into networks bound together by filaments of logic, and these networks communicate with other networks, forming larger networks which are parts of even larger networks. Indeed, to know the facts beyond doubt, we must construct the sorts of stories that are called proofs. Proofs are a bit like the stairway, in that they are linear, and following the links in a chain of deductions can be a bit like climbing the stair, but devising a proof that works is quite different. The landscape of what’s true doesn’t come with a map, let alone an itinerary, and sometimes the shortest path leading us from the things we already know to the thing we want to know is extremely circuitous, and requires wandering far from where anyone has journeyed.

The staircase is linear but human thought is not. Sometimes intuition leaps over steps in a proof and then fills in those steps after the fact; Gauss himself once said “I have had my results for a long time but I do not yet know how I am to arrive at them.” But for him, a proof was more than a certificate of truth; it was also a source of understanding. A bad proof can tell you *that* something is true without telling you *why* it’s true, and even a good proof can fail to satisfy; mathematicians often want multiple proofs that illuminate some aspect of mathematical reality from various angles. Gauss himself sought proof after proof of the fundamental theorem of algebra because he wanted to understand it deeply. The quest for understanding is more central to mathematics than the quest for mere certainty.

In an earlier, more fanciful draft of this essay, I wrote: “We do not know Who built the stairway, but They did not build it for us.” But I don’t really believe in such a Them, so it seems disingenuous to try to raise readers’ goosebumps in this cheesy fashion.

Still: if there exist beings of infinite mind outside our physical universe who can encompass the stairway and all that it contains, they must have a very different relationship to mathematical truth than we do. They need no proof that counting-number solutions to 2*a*^{2} = *b*^{2} don’t exist; they just see it at a glance, by surveying all counting number at once. For us, *P*_{3} was true *because* *P*_{2} is true *because* *P*_{1} is true. For them, *P*_{1} is true *and* *P*_{2} is true *and* *P*_{3} is true, just because they’re self-evident. There’s a huge leveling, and a huge loss, when all mathematical facts are equally transparent. In a way I envy such mathematically omniscient beings, but in way I pity them. They’re missing out on the stories that connect the facts, and the struggle to construct such stories when one has a finite mind and only a partial understanding of the landscape. For me, the human struggle to know what’s true, and why it’s true, is what gives the quest for mathematical knowledge its drama, its dignity, and its joy.

*Thanks to Sandi Gubin.*

**ENDNOTES**

#1. Some skepticism about this anecdote is in order here. Also, I don’t like the way it reinforces the genius myth. For more on the Gauss story, see my essay Reasoning and Reckoning. The essay you’re reading now can be viewed as something like a second draft of that earlier essay. For more on the genius myth, see my essay The Genius Box.

#2. The pioneering 20th century neuroscientist Warren McCulloch, whose ideas about neurons and computation foreshadowed advances in our own century, decided when he was young that he would devote his life to the two-part question “What is number, that man may know it, and what is man, that he may know a number?” We still don’t have a good answer.

#3. The right-hand side of *P _{n}* is (

#4. The principle of mathematical induction has many variants that are equivalent to it, so if you’re thinking “Wait, the induction step means showing that *P _{n}* implies

#5. I think the game is interesting, so if any of you know anything about it, either because it’s already been studied by others or because you played around with it and figured some things out, please let me know in the Comments!

#6. Let *P _{n}* be the proposition that any downward trip that starts with

#7. The pictures above show octagons with horizontal and vertical sides, but the argument also works for canted octagons like this:

The conclusion is that you can have a regular octagon, and you can have an octagon whose corners are on the square grid, but you can’t have an octagon that achieves both feats at the same time. A variant of this arguments works for grids in *d* dimensions for all *d* > 2. So if you’re one of those people who thinks that our seemingly continuous physical world is actually made up of little cubes analogous to the pixels that constitute digital photographs, then no regular octagons for you!

Magic paper helps me with some problems that have long bedeviled classroom teachers like myself: How do you find out what’s going on inside your students’ heads in the midst of a lesson without derailing it? How do you get all your students to actively participate without having the class descend into chaos? How do you communicate with a large group of students without the conversation devolving into what math educator Henri Picciotto calls a “pseudo-interactive lecture” dominated by the teacher and the two or three most vocal students?

Back in the 1980s, educator David R. Johnson tackled these problems using what he called the “paper and pencil method” of getting real-time feedback on how students are doing. Under this model, the teacher asks a question, the students write down their answers, and the teacher sees what the students wrote. This was back before magic paper, so the teacher would have to physically move around to look at the students’ responses. To make the moving-and-looking more expeditious, Johnson suggested that teachers seat students in a U-shape arrangement with the teacher stationed at the center. For more on Johnson’s ideas, see his book *Every Minute Counts* (1982, Dale Seymour Publications).

Around that same time, educator William F. Johntz pioneered an initiative called Project SEED that I had the good fortune to be exposed to while in graduate school; see the essay about it on Henri’s Picciotto’s math education blog.^{1} SEED had an innovative and charming way of opening up an underused communication channel from the class to the teacher. In SEED classes, hand signals played a big role: hand signals for numbers, for operations, for equality and inequality; hand signals for agreement, disagreement, partial agreement, and confusion. A teacher could ask a class a question (plant a seed, if you will) and quickly reap a rich visual harvest of information, a panoramic representation of her students’ states of mind. This provided SEED teachers with even snappier feedback than Johnson’s paper-and-pencil method, though with some limitations (there are after all only so many hand signals you can teach a class^{2}).

In the 1990s I turned my efforts wholeheartedly towards mathematical research, with teaching as a side activity that I tried to perform competently and compassionately but which didn’t arouse my highest passions. I read what people like Sheila Tobias and Alan Schoenfeld and Uri Treisman and Liping Ma were writing, and some of their ideas affected my teaching, but mostly I taught my students in the same ways that I had been taught. In particular, when I asked a question, I waited until a reasonable number of hands were raised (or until I gave up on waiting for more hands to go up; I never felt comfortable cold-calling students). I would pick someone whose hand was raised (trying to pick whichever of the hand-raisers had spoken up the least so far that day), and then respond to that person as if The Class had just spoken to me through its Chosen Representative. But of course the students who spoke up weren’t representative of the class as a whole.

Fast forward a few decades to the Covid-19 pandemic. Suddenly I was teaching over Zoom with very little relevant experience. As my time permitted, I took some online classes in how to do online teaching, and one of the tricks I learned was Chat-Storming. I quickly grew enamored of it. In Chat-Storming, I ask a question and none of the students answer right away, because I’ve told them not to. Instead, students compose answers in the Chat field of their Zoom portal but don’t press Enter/Return until I say “Okay, submit your answers.”

Then a flood of feedback drops down on my head as all the students answer at once. If it were an auditory overlay of student responses, it would just be a roar of white noise, but it’s visible, searchable, interpretable. I can’t tell which students were quick and which were slow, but I can see at a glance what the students as a whole think, and also look at individual answers in as much detail as I wish. It’s as easy to visually scan the magic paper (in this case, Zoom’s chat log) as it is to scan the hand signals in a Project SEED classroom, and the responses have a higher information content. Chat Storms use the visual information channel William Johntz championed, but with more bandwidth. As a bonus for the teacher, students aren’t able to peek at each other to try to assess whether their instincts are right or wrong based on whether the apparently “top” students agree with them; they’re on their own, and must make up their own minds.

Now I’m back in the physical classroom again, but I still use Chat Storms because they’re the best way I know to create in-class engagement that also can be used for assessment of participation^{3} and gives me realtime feedback on what students understand and what they don’t.^{4}

Does anyone know who came up with the Chat Storm? If you have any leads, please share them in the Comments!

Among its other virtues, the Chat Storm taps into the seldom-utilized positive power of boredom. In the past, I would sometimes force a class to speak by using the brutal tactic of not saying anything. If a teacher pauses for long enough, someone will break the awkward silence, once the students realize that you’re willing to wait as long as it takes. Chat Storms do something similar. When a Chat Storm is going on, the classroom is a really boring place. Nothing is going on except a lot of people thinking and writing on their magic paper. If you’re a student in such a classroom, you’ll quickly realize that nothing interesting is going to happen; you might as well join in the thinking and writing.

Well, maybe that’s not entirely true. If you’re stuck in a silent classroom with nothing but a smartphone, there are approximately infinity things^{5} you can do on your phone besides participating in a Chat Storm. And indeed some students who are prone to being distracted by their phones (such as students with ADHD) have told me they prefer a more traditional style of teacher-student interaction. But the majority of students like what I’m doing; they find that Chat Storms are enjoyable, keep them engaged, and provide feedback on how they’re doing.

“Excuse me, professor: could you give an example of a Chat Storm?” (I hear a reader of this essay ask).

Excellent question! I’m so glad you asked that!

When I assigned my discrete mathematics class the task of forming the logical negation of “Everybody’s a critic” using the Chat Storm format, I got a few expected wrong answers of the form “Nobody’s a critic” but also a couple of instances of a wrong answer I hadn’t expected: “At least one person is a critic.” This led to an unplanned discussion of the meaning of negation that I hadn’t realized the students needed, and I enunciated a criterion I hadn’t thought of before: if you can imagine a universe in which *p* and *q* are both false, *or* a universe in which *p* and *q* are both true, then *p* and *q* are not negations of each other. The students found that criterion helpful; I think I’ll teach it on purpose in the future.

Here’s another example from my recent teaching: when explaining the basics of set theory I arranged a Chat Storm in which I solicited mnemonics for keeping ∪ (union) and ∩ (intersection) straight. In the past I have pointed out to students that ∪ looks like the U in the word Union, and that by a process of elimination the other symbol ∩ can be deduced to mean intersection (the “other thing”). Some of the students had the same mnemonic I’d come up with. But one suggested a mnemonic I hadn’t seen before, pointing out that ∩ resembles the lower-case “n” in “intersection”. I think that’s a keeper too!

I happen to use Zoom and Chat but I know of people who use other kinds of magic paper: Miroboards, Mentimeter, Mattermost, and others. There’s also polling software, but polls often need to be prepared in advance, and many polling systems only allow multiple choice. One thing I like about Chat is its spontaneity (I can whip up a Chat Storm on the whim of a moment) and its open-endedness (if a student wants to include a joke or comment in their answer the system will permit this expression of their individuality).

But going back to what I wrote in the first sentence of this essay, the fact is, the year 2022 *is* bad science fiction, just as 2020 and 2021 were; if I’d prophetically written an acurate account of the pandemic back in the 1970s and tried to sell it as a novel, no publisher would have touched it. (“Dear Sir: This blatant *Andromeda Strain* ripoff somehow manages to be scientifically over-detailed, politically implausible, mind-numbingly boring, and deeply depressing all at the same time.”) Yet the ongoing viral storm has had some silver linings. The best one is the advent of mRNA vaccines, which are likely to have marvelous applications to improving people’s health in years to come. But somewhere down my private list of silver linings I’d put this new way of engaging my students.

So if you walk by my classroom and see my students glued to their phones while I’m standing by silently, looking around at students who aren’t looking at me, don’t assume that nothing is happening; a Storm is probably brewing.

*Thanks to Sandi Gubin, Henri Picciotto, and my discrete mathematics students.*

**ENDNOTES**

#1: Project SEED was a wonderful embodiment of the idea that you can set students up to discover deep mathematical ideas for themselves through artfully constructed activities. A teacher might start a SEED class by asking “What do you get when you add an odd number of odd numbers?” and then guiding the ensuing discussion. I could say a lot more about the Project SEED approach to teaching and about the many aspects of it that resonate with me, but I wouldn’t do a better job than Henri Picciotto has already done in his essay.

#2: I wonder how subjects are taught at Gallaudet University. Since everyone there is proficient in American Sign Language, there’d be opportunities for a whole class to respond to a teacher simultaneously in ASL; that might work really well not just for math but for other subjects too.

#3: I’ve written C programs and shell scripts that allow me to quickly determine, using the .txt files Zoom creates, how often each student wrote something in the chat at that particular class meeting. I’m happy to share the programs with anyone who’s interested.

#4: It’s fun to structure a sequence of Chat Storms that allows students to advance their understanding in manageable increments, and deeply satisfying to watch things sink into their brains, as evidenced by rising performance over the course of ten minutes. Of course, sometimes hardly anyone gets the right answer to a question I’ve asked. That means I didn’t design that part of the class well, and that’s never an enjoyable discovery. But the discovery enables me to learn what’s not working and fix it.

#5: There’s a wonderful exhibit at the Boston Museum of Science illustrating just how versatile smartphones are. It’s a replica of a room full of dozens of appliances, every single one of which has been rendered more or less obsolete by the smartphone. Even the stapler? Well, there are apps that have the specific purpose of combining multiple pieces of magic paper into a single magic document. Some of my students use those apps when they submit their homework on Blackboard. So yes, even the stapler.

]]>But first, where did polynomials come from?

**THE ART OF THE THING**

“Thing” is a marvelously flexible word, as are similar words like “*res*” and “*cosa*” that other languages have used to signify unspecified objects. Often the word denotes a group of people who have come together for some purpose: think of the Roman Republic (the “public thing”) or the Cosa Nostra (“Our Thing”). Curiously, the English word “thing” itself seems to have traveled in the opposite direction, starting out as meaning an assembly of people and ending up as meaning, well, any-thing. Math has made its own uses of nonmathematical words for indefinite objects: in Indian and Arabic algebra, the quantity being sought was often called “the thing”. It was natural for European algebraists to borrow this usage, and indeed Renaissance algebra was sometimes referred to as “the art of the thing”. (See Endnote #1.)

Of course polynomials predate Europe; mathematicians around the world used polynomials two thousand years ago, back in the days when math was made of words and algebra was rhetorical (see Endnote #2). But when you think of polynomials you probably think of the way they’re written in modern high schools, which means you’re thinking of the modern, symbolic approach to polynomials pioneered by Rafael Bombelli. I wrote about Bombelli in my essay on complex numbers. His 1572 book *L’Algebra* popularized a version of exponential notation similar to the one we use today and he gave clear rules for how to perform arithmetic operations on polynomials. For instance, here’s an approximate translation of Bombelli’s description of how we can multiply simple polynomials like 3*x*^{4} and 5*x*^{6} (where he refers to powers as “dignities” and exponents as “abbreviatures”):

*When one has to multiply dignities one adds the numbers of the abbreviatures written above, and from those will be formed an abbreviature of dignities, and the numbers that stand below that dignity are simply multiplied.*

That is, to multiply 3*x*^{4} by 5*x*^{6} we multiply the “numbers below” (the 3 and the 5) to get 15, and we add the “abbreviatures” (the 4 and the 6) to get 10, obtaining 15*x*^{10} as the answer.

Polynomial notation gave algebra a new economy of expression, but more important was the new viewpoint that polynomials brought in, relating solving equations to factoring polynomials. Consider the equation *x*^{2} + 2 = 3*x*; it’s not hard to check that *x* = 1 and *x* = 2 are solutions, but how can we be sure that there aren’t more? On the other hand, consider the equation (*x*−1)(*x*−2) = 0. It’s algebraically equivalent to *x*^{2} + 2 = 3*x* (to see why, expand the left hand side of (*x*−1)(*x*−2) = 0 and do some rearranging), but it’s more eloquent in explaining to us why there are no solutions we’ve overlooked. For, if *x* is equal to neither 1 nor 2, then *x*−1 can’t be 0 (because *x* isn’t 1), and *x*−2 can’t be 0 (because *x* isn’t 2), so neither *x*−1 nor *x*−2 can equal 0; but then (*x*−1)(*x*−2), being the product of two nonzero numbers, can’t be 0. In modern algebra, the fact that the product of two nonzero numbers cannot be zero gets promoted from boring truism to organizing principle. In particular, it implies that a polynomial equation of degree *n* has at most *n* solutions.

The word “polynomial” goes back to the writings of René Descartes, who cobbled the word together using the Greek prefix “poly” meaning “many” and the Latin root “nomen” meaning “name”. Here “poly” refers to the fact that a polynomial can be a sum of many terms that feature *x* raised to various powers. When there’s just one term, as in the polynomial *x*^{4}, we refer to the polynomial as a *monomial* (even though “mononomial” would be more apt). When there are two terms, as in the polynomial *x*^{4} + *x*^{3}, we refer to the polynomial as a *binomial*. (See Endnote #3.)

**MAKING A DIFFERENCE**

Polynomials have many important applications in the sciences, but the following party trick is not one of them: Have a friend pick four counting numbers *a*, *b*, *c*, and *n* but **not** reveal them to you; instead, they are to reveal to you the values of the polynomial *ax ^{2}* +

For instance, say they picked *a*, *b*, *c*, and *n* to all equal 1 (but you don’t know that), so they reveal to you the values of the polynomial * x^{2}* +

The first row of this table consists of the three numbers your friend revealed (call them *r*_{1}, *r*_{2}, and *r*_{3}), with spaces reserved for the three numbers you’re going to eventually announce back. The second row gives the differences *r*_{2} − *r*_{1} and *r*_{3} − *r*_{2}, while the third row gives the difference-of-differences (*r*_{3} − *r*_{2}) − (*r*_{2} − *r*_{1}). (Check: 7 − 3 = 4, 13 − 7 = 6, and 6 − 4 = 2.) We say that the second row is the *difference sequence* of the first row (and likewise the third row is the difference sequence of the second row).

Now extend that bottom row by repeating its sole entry three more times:

Next, fill in the second row of the table in such a way that the third row is the difference-sequence of the second row:

(Check: 8−6 = 2, 10−8 = 2, and 12−10 = 2. Of course I didn’t just guess the numbers 8, 10, and 12 at random and happen to be lucky; I calculated them successively as 6 + 2 = 8, 8 + 2 = 10, and 10 + 2 = 12.)

Finally, fill in the first row of the table in such a way that the second row is the difference-sequence of the first row:

(13+8 = 21, 21+10 = 31, and 31+12 = 43.) You announce the numbers 21, 31, and 43, and your friend checks that these are indeed the values of 4^{2} +4+1, 5^{2} +5+1, and 6^{2} +6+1.

And now, if you really want to be impressive, you can race your friend to compute the next five terms. You’ll probably win by just extending your table out five more places, because your arithmetic will be simpler than theirs! It’s possible that you and your friend will get different answers for one of the numbers. Then you can pull out your phone, open the calculator app, and see who’s right and who’s wrong. My guess is that your friend is wrong, because the procedure they’re following involves more steps.

Why does the trick work? Look back at the table. Notice that the second row is an arithmetic progression, with each term increasing by the same amount (namely 2) as one progresses from left to right. I show in Endnote #4 that if you write the successive values taken on by a quadratic polynomial and compute the successive differences, you always get an arithmetic progression. That is, if you list successive values taken on by a polynomial function of degree 2, and you take the differences, those differences will be the successive values taken on by a polynomial function of degree 1. And this trick isn’t limited to polynomials of degree 2; see Endnote #5.

We call the tables we generate in this way difference tables. This game of operating on polynomials to get new polynomials of lower degree (deriving a row of the table from the row above) and reversing the process (deriving a row of the table from the row below) is called the calculus of finite differences, not to be confused with the infinitesimal calculus of Isaac Newton and Gottfried Leibniz. (Incidentally, even before Leibniz and Newton were born, Indian mathematicians in Kerala applied the calculus of finite differences to the sine and cosine functions and obtained the power series expansions of these functions! I’ll tell you this story soon.)

I propose a puzzle you might wish to think about using the ideas of this section: Show that if *p*(·) is a polynomial of degree *d* and if *p*(1), *p*(2), . . . , and *p*(*d*+1) are all integers, then *p*(*n*) is an integer for all integers *n*. (An example of such a polynomial is my “Personal Polynomial” (5/2)*x*^{2} − (17/2)*x* + 16 which I wrote about last month.) For a solution, see Endnote #6.

**PATTERNS IN THE POWERS**

I have no talent for grave-robbing, nor am I consumed by curiosity about the respective roles played by heredity and environment in determining a person’s mathematical ability; but were I so talented, and so consumed, I’d consider digging up the bones of the Bernoullis to scrape together some of their DNA. These eight mathematicians constituted a kind of European mathematical nobility for nearly a century. If there’s anything like a math gene, one might suppose that the Bernoullis had it. (Do historians of science know anything about the women of that family? Given that they shared the heredity and environment of the men, one would expect that they too would have displayed mathematical talent even if they were never given a chance to develop it.)

The Bernoulli numbers — the sequence of numbers 1, 1/2, 1/6, 0, −1/30, 0, 1/42, 0, −1/30, 0, 5/66, … — were named after a member of the Bernoulli family, but he didn’t discover them. Priority belongs to the German weaver-surveyor-mathematician Johann Faulhaber (1580-1635) who had investigated them a hundred years earlier.

The roots of Faulhaber’s work were ancient. It had been known for millennia that 1 + 2 + 3 + … + *n* is equal to (1/2)*n*^{2} + (1/2)*n* for all *n*, and for nearly as long that 1^{2} + 2^{2} + 3^{2} + … + *n*^{2} is equal to (1/3)*n*^{3} + (1/2)*n*^{2} + (1/6)*n* for all *n* and that 1^{3} + 2^{3} + 3^{3} + … + *n*^{3} is equal to (1/4)*n*^{4} + (1/2)*n*^{3} + (1/4)* n^{2}* for all

Faulhaber found a formula in which certain mysterious multipliers played a role. These multipliers are what we now call the Bernoulli numbers. Faulhaber computed the first dozen or so of them and then stopped; he had demonstrated a way to find them, albeit an arduous one, and that was progress enough for him.

Faulhaber’s work was forgotten for a century. Then two mathematicians working completely independently rediscovered what Faulhaber had known: the Japanese mathematician Seki Takakazu (1642-1708) and the Swiss mathematician Jacob Bernoulli (1654-1705). Neither mathematician published his results while he was still alive; Takakazu’s result was published in 1712, while Bernoulli’s was published in 1713 as *Summae Potestatum*.

Bernoulli’s elation at his discovery is evident from his words, and his joy is laced with a bit of *schadenfreude*:

*I have found in less than a quarter of an hour that the tenth powers of the first thousand numbers beginning from 1 added together equal 91,409,924,241,424,243,424,241,924,242,500, from which it is apparent how useless should be judged the works of Ismael Bullialdus, recorded in the thick volume of his Arithmeticae Infinitorum, where all he accomplishes is to show that with immense labor he can sum the first six powers — part of what we have done in a single page.*

Bernoulli’s analysis laid more emphasis on the numbers he called A, B, C, D, etc. than Takakazu’s did, and Bernoulli was part of the European mainstream, so it’s natural that Leonhard Euler named these numbers Bernoulli numbers and not Takakazu numbers.

**STEAM DREAMS**

The English mathematician Charles Babbage had a dream.

It started in 1821, when young Charles and his friend John Herschel, charter members of the newly founded London Astronomical Society, were bemoaning the unreliability of published astronomical tables. Existing tables often had errors, whether caused by the people who computed the numbers (called “computers” in those days) or by the typesetters who recorded the numbers in print. Meanwhile, elsewhere in England, steam was doing amazing things to amplify the powers of the human body. As Babbage would later write:

*Mr. Herschel . . . brought with him the calculations of the computers, and we commenced the tedious process of verification. After a time many discrepancies occurred, and at one point these discordances were so numerous that I exclaimed, “I wish to God these calculations had been executed by steam,” to which Herschel replied, “It is quite possible.”*

Mechanical adding machines already existed; the philosopher-mathematician Blaise Pascal had himself invented one in 1642. Babbage realized by taking the components of such machines and hooking them together in new ways, he could create machines that could automatically tabulate the values of polynomial functions.

Let’s see how a small Difference Engine (for that is what Babbage called his invention) could have tabulated the values of the polynomial *n*^{2} + *n* + 1 that we met at a party earlier in this essay. Picture a machine with three numerical registers that evolve over time, with each number represented by the states of various gears as in a Pascal adding machine. At the beginning of the machine’s performance of its task the registers show the three numbers

(never mind why those specific numbers, at least for now). Then the device adds the 4 in the second register to the 3 in the first register and adds the 2 in the third register to the 4 in the second register, leaving the 2 in the third register alone:

(Here the straight and slanted vertical lines indicate how the new values of the registers are sums of the old values: 7 is 3+4, 6 is 4+2, and 2 is just 2.) Then the device adds the 6 in the second register to the 7 in the first register and adds the 2 in the third register to the 6 in the second register, once again leaving the 2 in the third register alone:

Then it executes the same procedure again:

And so on.

Do these numbers look familiar? They should! Babbage’s machine is constructing the difference table

one diagonal at a time. If we wanted a table of values of the polynomial *n*^{2} + *n* + 1, all we’d have to do is connect the Difference Engine to a suitable printer and have it print out the numbers that successively occur in the first register.

Earlier I wrote: “Polynomials have many important applications in the sciences, but the following party trick is not one of them.” That’s true enough. But we also saw (in the final part of the trick) that the paper-and-pencil technology of difference tables gave you a better way to compute values of your friend’s polynomial than your friend’s straightforward way. Also, we just saw how Babbage realized that you and your hand could be replaced by infallible (or at least less-fallible) gears, and that the more-accurate tables produced by such mechanical processes would have scientific (and industrial and military) uses. So yes, it’s a party trick. But it’s a party trick with a direct thematic link to Babbage’s Difference Engine. And as we’ll soon see, Babbage didn’t stop there.

**ADA LOVELACE**

Annabella Milbanke was not exactly George Gordon’s “type”, and he was certainly not hers. The latter circumstance was part of her appeal; her initial indifference intrigued and attracted him. (Or so at least I surmise; I wasn’t there.) But let’s give Annabella and George their titles: Annabella Milbanke Baroness Wentworth and George Gordon Lord Byron. Yes, that Byron. The memorable description of the iconic poet and scoundrel as “mad, bad, and dangerous to know” was coined not by his enemies but by his more-than-friend Lady Caroline Lamb. Milbanke would be known in her later years as a champion of progressive causes, such as vaccination. Byron admired the intelligent young Annabella and in a letter to her aunt dubbed the bookish heiress a “Princess of Parallelograms”. Annabella rejected his first proposal of marriage but unwisely accepted the second. When a libertine and a moralist fall in love in a romantic comedy, each has a moderating influence on the other, but in real life, clashing natural proclivities can lead to ever-greater polarization, and such I think was the case with Annabella and George. After only a few years their marriage ended (“I heard she moved out! She gave up on the marriage!” “Well, *I* heard she only moved *out* after he brought one of his lovers *in*!”), but before that they had a baby together, a child who was mere months old when her parents separated.

Young Ada showed an even greater aptitude for mathematics than her mother had and Annabella encouraged the girl, hoping that the pursuit of mathematics, and more broadly the cultivation of her faculties of reason, would protect her from the genetic taint of her father’s unbalanced temperament (or, some might say, insanity). The end result of Annabella’s efforts was a daughter who pursued what Ada called “poetical science”, eventually earning her (from an admiring Babbage) a nickname that surpassed the one Byron had given her mother: “Enchantress of Number”.

Ada met Babbage in 1833 at a party at Babbage’s house. The teenager must have made a good impression on the quadragenarian, because he invited her and her mother to attend a demonstration of his newly constructed Difference Engine (a prototype of a larger version he hoped to build): a two-foot-tall machine with 2000 brass parts that was powered by a hand crank and could have printed out mathematical tables if only Babbage had completed the envisioned printer that went with it.

Babbage’s plans for his Difference Engine ran aground on the shoals of several problems. One was that machining parts to the required precision was a more arduous and expensive proposition than he had realized when he first proposed his scheme to the British government as a way to automate the production of mathematical tables. Someone with more business sense or political savvy might have been able to handle these delays and overruns, but the unworldly Babbage wasn’t up to the task. Besides, he was distracted by a vision of an even greater machine that could solve more interesting problems than computing values of polynomials. He called the envisioned device his Analytical Engine.

In 1840, Babbage, having failed to interest the British government in supporting his work, visited Italy to give a talk about the Analytical Engine and thereby drum up enthusiasm and funding overseas. His talk inspired the Italian mathematician Luigi Menabrea to publish a description of the engine in French. Ada (now married to William King-Noel, First Earl of Lovelace) took on the project of translating Menabrea’s article into English, and decided to enhance the article with Notes of her own (notes so comprehensive that they eventually dwarfed what Menabrea had written). In her notes, Lovelace limns a future in which mechanical devices will be able to do such things as compose music, if only we tell them quite precisely the mathematical rules governing musical composition. But “Sketch of the Analytical Engine Invented by Charles Babbage, with Notes by the Translator” isn’t just a prophecy of a coming age of machine intelligence; it also contains what some have called the first true computer program. And that program computes Bernoulli numbers.

Computing Bernoulli numbers is a good deal more complicated than computing the values of polynomials; see Target’s article if you want to know more. One reason Lovelace chose this computing challenge was that it showcased a feature of the Analytical Engine that the Difference Engine had lacked: the ability of a computation to enter a loop, or as Babbage put it, to “eat its own tail”. Nowadays we recognize the momentous significance of the difference between the two designs: in principle, a machine like the Analytical Engine had crossed over into the domain of universal computation.

Whether or not one chooses to regard Lovelace’s program as the first computer program ever written, it was certainly the most complicated set of instructions for a mechanical computation that had ever been described up till then. Computer science bloggers Jim Randall and Sinclair Target and Stephen Wolfram noticed that at one point in her program there’s a mistake: the numerator and denominator of a fraction have been swapped. The first programmer was also the creator of the first programming bug! But I don’t think the bug does her any discredit; it points to the magnitude of her ambition and the complexity of the task she had chosen to undertake. As Target asks, if you aren’t writing bugs, are you writing real programs?

Alas, Babbage’s dreams were bigger than his budget and his managerial capabilities. The Engine was never built. Ada died of cancer in 1852 at the age of 37, and Charles died in 1871, a bitter and disappointed old man. The Victorian Computer Age never dawned (though authors of steampunk fiction keep wondering “What if …?”).

*If* the Analytical Engine had been built in her lifetime, I have no doubt that Lovelace would have found the bug in her program. And while we’re talking about mistakes, did any of you notice that the page from Bernoulli’s “Summae Potestatum” that I showed you earlier contains an error? That last term in the second-to-last polynomial in his table should be −3/20 *nn*, not −1/12 *nn*. Bernoulli was a creative human being, not a mathematical engine. (Though I have no doubt that *if* you’d told Bernoulli he’d made a mistake in his table, he would’ve found it in a lot less than fifteen minutes.)

The history of math shows time and time again that the giants of mathematics aren’t flawless paragons of reason who never err; they’re humans who discover new vistas, explore them, have creative ideas (some of which work), inevitably stumble as they traverse landscapes never seen before, recover from their stumbles, and move on. Indeed, the creative faculty of human beings — not to be found in the Analytical Engine, which Lovelace famously wrote can only do “whatever we know how to order it to perform” — may share roots with the human propensity for error. The Latin root for “error” is the word for wandering. If we don’t wander off beaten paths, how will we know what vistas we’re missing?

*Thanks to Sandi Gubin, Eliana Propp-Gubin, and Stephen Wolfram.*

**REFERENCES**

Sarah Baldwin, Ada Lovelace and the Analytical Engine.

Janet Beery, Sums of Powers of Positive Integers – Jakob Bernoulli (1654-1705), Switzerland.

Peter Cameron, Polynomials taking integer values.

A. W. F. Edwards, “Sums of powers of integers: a little of the history”, The Mathematical Gazette, Vol. 66, No. 435 (Mar., 1982), pp. 22-28.

Martin Gardner, “The Calculus of Finite Differences”, chapter 20 in “New Mathematical Diversions”.

Silvio Maracchia, The importance of symbolism in the development of algebra, *Lettera Matematica* volume 1, pages 137-144 (2013).

Burkard Polster, “Power sum MASTER CLASS: How to sum quadrillions of powers … by hand! (Euler-Maclaurin formula)”: Mathologer channel, https://youtu.be/fw1kRz83Fj0 .

Duana Saskia, Discovering Ada’s Bernoulli Numbers, Part 1. (Alas, there seems to be no Part 2!)

Sinclair Target, What Did Ada Lovelace’s Program Actually Do?

Stephen Wolfram, Untangling the Tale of Ada Lovelace.

**ENDNOTES**

#1. Algebraists were sometimes called “cossists”, which I suppose could be translated as “thingologists”.

#2. The wordy kind of number-talk that people used in the old days really is called “rhetorical algebra”. I’m not kidding. Maybe high school algebra teachers today should spice things up in the classroom by using a broader range of classic rhetorical devices: *accismus* (“Whatever you do, *don’t* subtract *x* from both sides!”), *adynata* (“You’ll find a solution to *x* = *x*+1 when cows do calculus”), *antimeria* (“Let’s see if our old friend Mister Quadratic Formula can help us out here”), etc.

#3. The word “binomial” occurs in a famed assemblage of late-19th-century English cultural trivia called “The Major General’s Song”, from Gilbert and Sullivan’s operetta *The Pirates of Penzance*. In the song, career soldier Major General Stanley, while conceding his complete lack of military knowledge, boasts:

*I’m very well acquainted, too, with matters mathematical;/ I understand equations, both the simple and quadratical./ About binomial theorem I’m teeming with a lot o’ news / With many cheerful facts about the square of the hypotenuse.*

Likewise, in Conan Doyle’s story “The Final Problem”, we are meant to infer that Sherlock Holmes’ nemesis Moriarty is Holmes’ intellectual equal when Holmes tells Watson:

*“He [Moriarty] is a man of good birth and excellent education, endowed by nature with a phenomenal mathematical faculty. At the age of twenty-one he wrote a treatise upon the Binomial Theorem, which has had a European vogue. On the strength of it he won the Mathematical Chair at one of our smaller universities, and had, to all appearances, a most brilliant career before him.”*

Of course the binomial theorem of Newton was old news by the time Holmes came on the scene. But I like to imagine that Holmes was thinking of the *q*-binomial theorem, which (as part of the broader subject of *q*-series) was a hot topic on the Continent in the 19th century.

#4. Write the polynomial *a* *x*^{2} + *b* *x* + *c* as *p*(*x*) for short, and define *q*(*x*) as *p*(*x*+1) − *p*(*x*), so that the numbers in the second row of the table are *q*(*n*), *q*(*n*+1), . . . Expanding the expression *p*(*x*+1) − *p*(*x*) and regrouping, we get

*q*(*x*) = (*a* (*x*+1)^{2} + *b* (*x*+1) + *c*) − (*a* (*x*)^{2} + *b* (*x*) + *c*) = *a* ((*x*+1)^{2}−*x*^{2}) + *b* ((*x*+1)−*x*) + (*c*−*c*) = *a*·(2*x*+1) + *b*·(1) = (2*a*)*x* + (*a*+*b*) ,

a linear function of *x*. So the numbers in the second row form an arithmetic progression.

#5. If your friend gives you *d*+1 successive values of some polynomial of degree *d*, you can find successive terms using a trapezoidal array with *d*+1 rows instead of just three. That’s because if *p*(*x*) is a polynomial of degree *d*, the polynomial *p*(*x*+1) − *p*(*x*) simplifies to give a polynomial of degree *d*−1. So if the top row of your table gives *d*+1 successive values of a polynomial of degree *d*, the second row will give *d* successive values of a polynomial of degree *d*−1; likewise the next row will give *d*−1 successive values of a polynomial of degree *d*−2; and so on, with the *d*th row giving 2 successive values of a polynomial of degree 1 and the final (*d*+1st) row giving the value of a polynomial of degree 0. And a polynomial of degree 0 is just a constant function of *x*; once you know its value at *x* = *n*, you know its value at *x* = *n*+1, *n*+2, etc.! So you can fill in the last row, which lets you fill in the second-to-last row, and so on, all the way back up to the top. So you can fill in that top row and announce those numbers while your friend is still squaring and swearing.

#6. Since the top row consists of integers, the whole triangle beneath it consists entirely of integers as well (since the difference of two integers is always an integer). But the bottom row represents a constant polynomial, so the bottom row extends to give infinitely many repetitions of that integer. Now fill in the difference table going upward; the sum of two integers is always an integer, so you’ll never see any non-integer values anywhere in the extended table, including in its top row. So this tells us that *p*(*d*+2), *p*(*d*+3), etc. are all integers as well. Technically this argument only handles *p*(*n*) as *n* goes to positive infinity, not as *n* goes to negative infinity, but by extending the difference table to the left as well as the right we can take care of this case too. I learned of this pretty application of difference tables from Peter Cameron’s blog (see the References).

#7. If you take the sequence whose *n*th term is 1^{k} + 2^{k} + 3^{k} + … + *n*^{k} and form its difference sequence, you’ll just get the sequence of *k*th powers whose terms are of course given by a polynomial of degree *k*. You might say that the sequence 1^{k}, 1^{k}+2^{k}, 1^{k}+2^{k}+3^{k}, … is the “anti-difference” of the sequence 1^{k}, 2^{k}, 3^{k}, … Since taking the difference sequence of a polynomial decreases its degree by 1, it makes sense that taking the anti-difference increases the degree by 1. And this fact from the calculus of finite differences foreshadows something similar that happens in the infinitesimal calculus, where differentiating a polynomial reduces its degree by 1 and anti-differentiating it increases its degree by 1. This is just one of many profound similarities between the calculus of finite differences and the differential calculus.

Mathematicians celebrate the French thinker René Descartes for inventing Cartesian coordinates.^{1} But we should also remember him as the person who tilted the terrain of Europe’s mathematical alphabet, using early letters of the alphabet to signify known quantities and imbuing later letters (especially *x*) with the pungent whiff of the Unknown. If you learned to write quadratic expressions as *ax*^{2} + *bx* + *c* instead of *xa*^{2} + *ya* + *z* (and I’m guessing you did), it’s down to Descartes.^{2}

My topic this month is polynomials like *ax*^{2} + *bx* + *c*. In school math, you first learned about *x* as an unknown, a number hiding behind a mask. (“What is *x*? Let’s find out.”) Later you learned to view *x* as a variable, so that a formula like *y* = *ax*^{2} + *bx* + *c* is a function or rule: if you give me an *x*, I’ll give you a *y*. (“What is *x*? No number in particular; *x* ranges over all real numbers.”) I’ll touch on both points of view today, but I’ll be stressing a viewpoint that’s probably less familiar, where *x* is neither an unknown nor a variable, but just, well, itself. From this perspective, polynomials appear as number-like objects in and of themselves, with their own habits and mating behavior.

Let’s start with something a bit silly. The Global Math Project website that has a polynomial with your name on it. Literally. Go to http://globalmathproject.org/personal-polynomial/ and type in your name, and the website will give you a mathematical expression that (unless you go by “JJ” or some other moniker whose letters are all the same) will contain one or more occurrences of the variable *x*; say hello to your Personal Polynomial. What makes it your Personal Polynomial is that if you replace *x* by the number 1, the expression turns into the numerical value of the 1st letter of your name (where A = 1, B = 2, etc.); if you replace *x* by the number 2, the expression turns into the numerical value of the 2nd letter of your name; and so on. For instance, when I typed in “JIM” the Personal Polynomial genie gave me the degree-two polynomial (5/2)*x*^{2} − (17/2)*x* + 16; with *x* = 1, my Personal Polynomial turns into 5/2 − 17/2 + 16 = 10, and sure enough, the 1st letter of my name is the 10th letter of the alphabet, J. Likewise, plugging in *x* = 2 gives 9 (the numerical value of I) and plugging in *x* = 3 gives 13 (the numerical value of M). That is, the computer found magic numbers *a*, *b*, and *c* with the property that the polynomial *p*(*x*) = *ax*^{2} + *bx* + *c* satisfies the three equations *p*(1) = 10, *p*(2) = 9, and *p*(3) = 13.

Try it! If you give the program a name with *n* letters, it’ll reply with a polynomial of degree *n*−1 or less that does the job.^{3}

When you use the Personal Polynomial website it’s hard to see anything sinister lurking nearby. But algebraic expressions have a seductive slickness that can lead us to invest more faith in them than we should. Although you’d be unlikely to be so extremely silly as to plug *x* = 4 into my Personal Polynomial and then announce that the fourth letter of “JIM” should be a V (or that the letter after that should be the thirty-sixth letter of the alphabet), certain American policy-makers did something similar during the early days of the Covid epidemic: they fitted a degree-three polynomial to epidemiological data from the past and used it to predict the future.^{4} You can read more about this in Jordan Ellenberg’s book *Shape*, though for a tongue-in-cheek demonstration of the pitfalls of extrapolation it’s hard to beat what Mark Twain wrote over a century ago:

In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitiic Silurian Period, just a million years ago next November, the Lower Mississippi River was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.

Mark Twain, “Life on the Mississippi”

**POLYNOMIALS SET FREE**

Students in algebra classrooms see polynomials in action in all kinds of ways. I already demonstrated *substitution* when I plugged *x* = 1 into (5/2)*x*^{2} − (17/2)*x* + 16 and got 10. There’s also the reverse problem of *solving* for *x*: if all I tell you about *x* is that (5/2)*x*^{2} − (17/2)*x* + 16 equals 10, what values might *x* have? 1 is one such value, but are there others?^{5} Here *x* plays the role of the unknown, the not-yet-known, the about-to-be-known, etc. And when you see a formula like *y* = *ax*^{2} + *bx* + *c*, chances are you’ll be asked to *graph* it.

But what I want to describe to you this month are generating functions, in which *x* is no longer a stand-in for an unknown number; *x* stands proudly on its own. And it brings along some friends.

In a way, generating functions are reminiscent of the numbers Euler gave us when he admitted a new player *i* into the number game and saw what new numbers it gave rise to, using no information about *i* except the equation *i*^{2} = −1. The big difference is that, whereas Euler decreed that *i*^{2} should equal −1, we won’t decree anything about *x*^{2} at all, or *x*^{3} or any higher powers (though we will decree that *x*^{0} equals 1). We remain agnostic about what *x* means and just manipulate expressions according to rules, like the rule that says that *x*^{a} times *x*^{b} equals *x*^{a+b} (“Exponents add when you multiply powers”). *x* has been freed from its obligation to refer to anything outside of itself.

The noted mathematician Herb Wilf^{6} once wrote that “A generating function is a clothesline on which we hang up a sequence of numbers for display.” Wilf wrote the book on generating functions, literally, but I would like to contest, or at least elaborate upon, the word “display”. Clotheslines aren’t just for display; they’re also useful. And so are generating functions. I’ll give three applications, all of which have to do with dice.

**ADDING DICE ROLLS AND MULTIPLYING POLYNOMIALS**

Thousands of years ago, our ancestors felt threatened by mighty and seemingly lawless forces in the air above and the earth beneath and attributed them to mighty powers who might be propitiated by sacrifices or whose actions might at least be predicted through suitable forms of divination. (This was before we had degree-three polynomials.) The unpredictability of processes like the casting of lots seemed to have an affinity to the capriciousness of Nature and so were thought to provide a kind of channel to the supernatural powers that controlled both.

The ankle-bone (or “talus”) of a hoofed animal would have seemed like a natural choice of divinatory device: these four-sided objects were plentiful, their origin linked them to life and death, and if you tossed one, it was hard to know which side would face upward when it stopped rolling. It’s theorized that over time such oracular bones evolved into cubical dice. In any case we know that even if dice were invented for divinatory purposes, they became coopted for games of chance quite early; some of the oldest dice archeologists have found are dice that have been tampered with in such a way as to bias the outcome (so-called “crooked” or “gaffed” dice). This might indicate an attempt to affect the hazards of weather or warfare but seems more likely to indicate an attempt to acquire wealth at the expense of others. (Even today, we acknowledge the dare-I-say dicey nature of personal finance by referring to a large sum as a “fortune”.) Dice games were commonplace by the time European mathematicians like Pascal and Fermat laid the groundwork for the modern theory of probability in the 1600s.

Consider a die of the modern kind, with six faces showing the numbers 1 through 6 in the form of dots, or “pips” (where a face with 1 pip signifies the number 1, a face with 2 pips signifies the number 2, and so on). We’ll give this die its own Personal Polynomial (yeah, dice aren’t persons, but you know what I mean): the polynomial *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}. That is, we take *x* to the power of each of the numbers shown on the die’s six faces, and we add those powers. (In precollege math it’s usual to write polynomials with the highest-degree term first and the lowest-degree term last, but for generating functions it’s more useful to do the reverse, so that the exponents count up instead of down.)

Why is it useful to associate polynomials with dice in this way? Because (as we’ll see) when we multiply the polynomials associated with two dice, we get useful information about what happens when we roll both dice and add the numbers that they show. Similarly, when we square the polynomial associated with one die, we get useful information about what happens when we roll the die twice and add the two numbers we see.

Before we dive into multiplying a degree-six polynomial by itself, let’s take the simpler example of a two-sided die (better known as a “coin”). Suppose the coin has 1 pip on one side and 2 pips on the other side. What can happen when you toss it twice and record the sum? You might get a 1 followed by a 1, or a 1 followed by a 2, or a 2 followed by a 1, or a 2 followed by a 2. So the total can be 2, 3, 3, or 4. We can make a two-by-two table of the four possibilities, which the alert reader (or even a woozy one) may recognize as a very small addition table that shows the sums you can get when you add 1-or-2 plus 1-or-2.

Since exponents add when you multiply powers, we find a similar pattern when we form a very small multiplication table that shows the products you can get when you multiply *x*^{1}-or-*x*^{2} times *x*^{1}-or-*x*^{2}.

It’s no coincidence that these are exactly the terms you get when you multiply *x*^{1}+*x*^{2} by itself and expand using the distributive law (or using “FOIL”, if you insist).

Likewise, imagine a three-sided die with 1 pip, 2 pips, and 3 pips on its three respective sides. We can make a table of all nine possibilities.

Reading the table, we find 1 way to roll a 2, 2 ways to roll a 3, 3 ways to roll a 4, 2 ways to roll a 5, and 1 way to roll a 6. Alternatively, we can multiply *x*^{1} + *x*^{2} + *x*^{3} by itself and collect like terms, obtaining the generating function (*x*^{1} + *x*^{2} + *x*^{3})^{2} = *x*^{2} + 2*x*^{3} + 3*x*^{4} + 2*x*^{5} + *x*^{6}.

The deluxe version of the distributive law says that if you have one sum of numbers and you multiply it by another sum of numbers, you have to individually multiply each number in the first sum by each number in the second sum and then add up all the resulting products, being careful not to leave any out or include any twice. Meanwhile, a multiplication table is designed to force us to record each possible product once and only once by giving us a well-defined place to record the answer. So the nine terms in the expansion correspond to the nine entries in the small multiplication table.

Now, at this point you should be skeptical of the usefulness of generating functions. If all we care about is the respective numbers of ways to roll a 2, 3, 4, 5, or 6 using two rolls of our three-sided die, then the extra baggage of those plus-signs and powers of *x* seems like mere typographical distraction from the real message. But suppose we want to know whether two rolls of a six-sided die (the kind of die people actually use in games) is likelier to give us a sum that’s odd or even. We can use generating functions to solve this problem just by *thinking* about the generating function

*p*(*x*) = (*x*^{1} + *x*^{2} + *x*^{3} +* x*^{4} + *x*^{5} + *x*^{6})^{2} = *x*^{2} + 2*x*^{3} + … + 2*x*^{11} + *x*^{12}

without actually expanding it out and writing down the intermediate terms.^{7}

**NEGATIVE ONE LENDS A HAND**

The trick involves looking at *p*(−1). (This essay is called “Let x equal x”^{8}, but that doesn’t mean I won’t sometimes want to let x equal some specific number like −1.) Actually, let’s first look at what we get when we replace *x* by −*x* in *p*(*x*). Remember that the polynomial *p*(*x*) is *x*^{2} + 2*x*^{3} + … + 2*x*^{11} + *x*^{12}. So *p*(−*x*) is (−*x*) + 2(−*x*)^{3} + · · · + 2(−*x*)^{11} + (−*x*)^{12} , which equals *x*^{2} − 2*x*^{3} + … − 2*x*^{11} + *x*^{12}. The coefficients don’t change in magnitude, but every second plus-sign gets flipped to a minus-sign. Notice that the outcomes in which the sum of the two numbers that are rolled is even correspond to terms with even exponent, which get left alone; but the outcomes in which the sum of the two numbers that are rolled is odd correspond to terms with odd exponent, whose sign gets flipped. So now, if you replace *x* by 1 in *p*(−*x*) (which is the same as replacing *x* by −1 in *p*(*x*)), you get the sum 1 − 2 + ··· − 2 + 1 in which the positive terms correspond to the ways to roll an even sum and the negative terms correspond to the ways to roll an odd sum.

That means in the contest between the forces of Even and the forces of Odd, we can assess the balance of power by seeing whether *p*(−1) is positive (in which case the outcomes with even sum outnumber the outcomes with odd sum) or negative (in which case the outcomes with odd sum outnumber the outcomes with even sum).

And here’s where the deus-ex-algebra descends upon the scene. Remember, *p*(*x*) equals (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{2} , so *p*(−1) equals (−1+1−1+1−1+1)^{2}. But that’s just 0^{2}, or 0. So it’s a stand-off between Even and Odd. That is, when you roll two dice, the sum has an equal chance of being even or odd.^{9}

Now you still may be thinking you’d rather not think so hard; you (or an algebraic calculator) could just expand (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{2} as

1*x*^{2} + 2*x*^{3} + 3*x*^{4} + 4*x*^{5} + 5*x*^{6} + 6*x*^{7} + 5*x*^{8} + 4*x*^{9} + 3*x*^{10} + 2*x*^{11} + 1*x*^{12}

and then just check that 1 + 3 + 5 + 5 + 3 + 1 (the sum of the coefficients of the even powers of *x*) equals 2 + 4 + 6 + 4 + 2 (the sum of the coefficients of the odd powers of *x*). But the real strength of the more abstract approach lies in how well it scales. Because: The problem I *really* wanted to ask you involves rolling a six-sided die *six* times. (Rolling it just twice was only a warm-up.)

So, suppose you roll a six-sided die six times. The generating function for the sum of the six numbers that you roll is (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{6}, a polynomial with 31 terms that I would never dream of writing out by hand (or if I did dream about it I’d call it a nightmare). But if you just want to know whether the sum of the six numbers is likelier to be even or odd, you can just plug in *x* = −1, obtaining (−1+1−1+1−1+1)^{6}, or 0. So once again, an even sum and an odd sum are equally likely.

Challenge problem: Suppose we have a (fair) five-sided die. (It’s easy to use a fair six-sided die to simulate a fair five-sided side: just roll it in the ordinary way, and if the outcome is a 6, look around furtively, announce “That didn’t just happen” and roll it again, and keep at it until you get a 1, 2, 3, 4, or 5.) If you roll your five-sided die five times, do you think the sum is likelier to be odd or even? What if you roll it six times? Generating functions give a quick route to the answer.^{10}

Things get even more fun when more variables come into the game.^{11}

**SICHERMAN DICE**

We’ve played with taking powers of the generating function *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}; now let’s go the other way and factor it.

Just as six jellybeans in a row can be divided into three groups of two or two groups of three, the terms in the sum

*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}

can be grouped as

(*x*^{1} + *x*^{2}) + (*x*^{3} + *x*^{4}) + (*x*^{5} + *x*^{6})

or as

(*x*^{1} + *x*^{2} + *x*^{3}) + (*x*^{4} + *x*^{5} + *x*^{6}).

The former grouping can be written as *x*^{0} (*x*^{1} + *x*^{2}) + *x*^{2} (*x*^{1} + *x*^{2}) + *x*^{4} (*x*^{1} + *x*^{2}), which equals (*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{1} + *x*^{2}), while the latter can analogously be written as *x*^{0} (*x*^{1} + *x*^{2} +* x*^{3}) + *x*^{3} (*x*^{1} + *x*^{2} + *x*^{3}) which equals (*x*^{0} + *x*^{3}) (*x*^{1} + *x*^{2} + *x*^{3}).

These factorizations give us curious ways to simulate a 6-sided die using a 2-sided die and a 3-sided die. (Multiplication of generating functions is the “mating behavior” I mentioned at the beginning of the essay.)

Let’s go back to the equation

(*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{1} + *x*^{2}) = *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}.

If we have a 3-sided die with faces showing 0 pips, 2 pips, and 4 pips (coming from the first factor on the left-hand of the equation) and a 2-sided die with faces showing 1 pip and 2 pips (coming from the second factor on the left-hand of the equation), and we roll them both and record the sum, the six equally likely outcomes are precisely the numbers 1 through 6.

Similarly, consider the equation

(*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{3}) = *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6}.

It tells us that if we roll a 3-sided die with faces showing 1 pips, 2 pips, and 3 pips and a 2-sided die with faces showing 0 pips and 3 pips, and we roll them both and record the sum, the six equally likely outcomes are again the numbers 1 through 6.

This leads us to reinvent what are called Sicherman dice, devised in 1977 by puzzlemaker George Sicherman^{12} (though he did not invent them using generating functions). Remember that the generating function for the sum of two ordinary six-sided dice is (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6})^{2} . We can use our two factorizations to write this as

(*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{1} + *x*^{2}) × (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{3}).

But, swapping factors, we see that this is equal to

(*x*^{0} + *x*^{2} + *x*^{4}) (*x*^{0} + *x*^{3}) × (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{1} + *x*^{2})

or (moving a factor of *x* from the *x*-endowed factor *x*^{1} + *x*^{2} to the *x*-impoverished factor *x*^{0} + *x*^{2} + *x*^{4})

(*x*^{1} + *x*^{3} + *x*^{5}) (*x*^{0} + *x*^{3}) × (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{1}).

The first product (*x*^{1} + *x*^{3} + *x*^{5}) (*x*^{0} + *x*^{3}) expands as

*x*^{1} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6} + *x*^{8}

while the second product (*x*^{1} + *x*^{2} + *x*^{3}) (*x*^{0} + *x*^{1}) expands as

** x^{1} + x^{2} + x^{2} + x^{3} + x^{3} + x^{4}**.

These two polynomials are the generating functions for Sicherman’s dice: the first has six sides bearing the numbers 1, 3, 4, 5, 6, and 8, while the second has six sides bearing the numbers 1, 2, 2, 3, 3, and 4. Despite the fact that the dice look strange, if you roll each once, the sum of the two numbers you roll is statistically indistinguishable from what you’d get from rolling two ordinary dice. And that’s because when you mate the generating functions of Sicherman’s dice, you’re getting the same polynomial factors that you get when you mate the generating functions of two ordinary dice — you just get them in a different order.^{13}

Sicherman dice were introduced to the world at large through the Mathematical Games column of Martin Gardner, the writer who was hailed in his time as “the best friend mathematics ever had”. I grew up reading many of those columns in the 1970s, and if it hadn’t been for Gardner I would be a very different sort of mathematician, if I were a mathematician at all.

On a YouTube channel created in Gardner’s honor called “Celebration of Mind”, Alexandre Muñiz has a nice video about Sicherman dice and other things; in the video he shows how you can take two 2-sided dice and two 3-sided dice and (pairing them up one way) simulate two ordinary dice and (pairing them up another way) simulate the Sicherman dice.

Muñiz and I have come up with a curious kind of die that I hope someone will fabricate and send me. It’s a clear cube made of resin or acrylic with an opaque tetrahedral die embedded in it, with the four corners of the tetrahedron corresponding to four of the eight corners of the cube. (Or maybe the surrounding cube exists only as a network of twelve struts; we still haven’t decided what physical instantiation works best.) In any case, the tetrahedron inside the cube has four faces with 0, 1, 2, and 4 pips respectively. You can check that the number of pips visible from above the die after you roll it is 1, 2, 3, 4, 5, or 6, just as with an ordinary die, but the total number of pips is only 7 instead of 28.

The tetrahedron was my idea; I thought one would roll it on a creased surface (such as the inside of an open book) so that it always lands on an edge, as described in a recent Riddler Express puzzle at FiveThirtyEight.com. Michael Branicky suggested using a taco holder while Zach Wissner-Gross preferred a ridged gnocchi board. Muñiz had the fantastic idea of embedding the tetrahedron in a cube. I can see it in my head, but I’d like to hold one in my hand!

[Postscript: Reader Dave LeCompte has fabricated such a die! A photo appears below.

The photo includes a penny for scale.]

If you find the math behind Sicherman dice fun, you might ask the question, for what values of *n* can you design two non-standard *n*-sided dice with the property that, if you roll both dice and record their sum, the outcome is statistically indistinguishable from what you’d get from rolling two standard *n*-sided dice? (Here the standard *n*-sided die has numbers 1 through *n* on its faces.) We’ve seen that you can do it for *n* = 6; what other values are possible? Post your ideas in the Comments.

Incidentally, the Romans used two kind of dice: a small six-sided die called a tessera whose sides were marked with the numbers from 1 to 6 (that is, essentially a modern cubical die) and a larger four-sided die called a talus whose sides were marked with the numbers 1, 3, 4, and 6. Do you see a way to “Sicherman-ize” the talus die? That is, do you see how to design two four-sided dice, different from the talus and from each other, with the property that if you roll them together, the distribution of the sum is the same as what you would see if you rolled two talus dice? Post your answer in the Comments.

Not related to dice but definitely related to generating functions is the 2Blue1Brown video “Olympiad-level counting” I recommended last month. If you haven’t watched it yet, now’s the time!

**CROOKED DICE**

I’ll close with a famous puzzle about cheating at dice; it can be solved with generating functions, though it is a bit more advanced than the other puzzles. The question is, can you design two crooked 6-sided dice with the property that their sum is equally likely to be any of the eleven numbers 2, 3, 4, …, 10, 11, and 12?

By a crooked die, I mean a die in which the six outcomes don’t have an equal chance of occurring. In practice, a really lopsided weighting would be (a) hard to achieve and (b) easy to detect, but since you’re reading this as a math-fan and not as a gambler, we’ll model a crooked die as being determined by any six nonnegative numbers *a*_{1}, *a*_{2}, … , *a*_{6} that add up to 1, and imagine that those are supposed to be the probabilities of the die landing with the six respective faces facing up.

I want two crooked dice, one associated with the probabilities *a*_{1}, *a*_{2}, … , *a*_{6} and the other associated with the probabilities *b*_{1}, *b*_{2}, … , *b*_{6}, so that when I roll both dice, the sum of the numbers shown is equally likely to take on each of the eleven possible values between 2 and 12.

To get you started on the problem, I’ll claim (without proof) that the property we’re trying to achieve can be restated in terms of the generating functions

*A*(*x*) =* a*_{1} *x*^{1} + *a*_{2} *x*^{2} + *a*_{3} *x ^{3}* +

and

*B*(*x*) =* b*_{1} *x*^{1} + *b*_{2} *x*^{2} + *b*_{3} *x ^{3}* +

Specifically, we want to choose numbers *a*_{1}, *a*_{2}, … , *a*_{6} and *b*_{1}, *b*_{2}, … , *b*_{6} so that *A*(*x*) times *B*(*x*) equals (1/11) *x*^{2} + (1/11) *x*^{3} + … + (1/11) *x*^{11} + (1/11) *x*^{12}. Can you find a way to do it or show that it can’t be done? Are there two generating functions that we can mate to create such an offspring?^{14}

*Thanks to Sandi Gubin, Alexandre Muñiz, Bill Ossmann, George Sicherman, James Tanton, and Dan Ullman.*

**REFERENCES**

Gary Antonick, “Col. George Sicherman’s Dice”, https://archive.nytimes.com/wordplay.blogs.nytimes.com/2014/06/16/dice-3/

Martin Gardner, “Sicherman Dice, the Kruskal Count and Other Curiosities”, chapter 19 in Penrose Tiles to Trapdoor Ciphers … and the Return of Dr. Matrix.

Alexandre Muñiz, “How to Roll Two Dice”, https://www.youtube.com/watch?v=-aDfFh5YUD8

George Sicherman, “Sicherman Dice”, https://userpages.monmouth.com/∼colonel/sdice.html

**ENDNOTES**

[Note: Some of you may have tried to access specific endnotes by clicking on the associated footnote numbers in the main body of the tex, and then been frustrated that this didn’t work. I’m frustrated too! I used to have a way to do this but it doesn’t work in the current version of WordPress. If any of you know a good way to create navigable internal links using the current WordPress implementation of hypertext, please let me know. It’s the year 20-friggin’-22; you shouldn’t have to scroll forward and then scroll back to read the endnotes!]

#1. Others like Pierre Fermat played a role in inventing what are now called Cartesian coordinates, but I don’t want to go down that interesting side-track today.

#2. Technically, Descartes would have written *ax*^{2} as *axx*, reserving exponential notation for powers higher than the 2nd.

#3. How does the mathematical engine lurking behind the website do its magic? You can find out from Tanton’s videos, available through links underneath the Personal Polynomial screen. I don’t know the details of this particular computer program, but I know one way the trick can be done, via the time-honored tactic of breaking a problem into pieces. If we can find three polynomials – call them *p*_{1}(*x*), *p*_{2}(*x*), and *p*_{3}(*x*) – satisfying

*p*_{1}(1) = 10, *p*_{1}(2) = 0, and *p*_{1}(3) = 0,

*p*_{2}(1) = 0, *p*_{2}(2) = 9, and *p*_{2}(3) = 0, and

*p*_{3}(1) = 0, *p*_{3}(2) = 0, and *p*_{3}(3) = 13,

then we can add them together to get a polynomial satisfying

*p*(1) = 10, *p*(2) = 9, and *p*(3) = 13.

If we define *q*_{1}(*x*) = (*x* − 2) (*x* − 3), then the polynomial *q*_{1}(*x*) almost does the job that *p*_{1}(*x*) is supposed to do; it satisfies two of the three conditions, specifically, *q*_{1}(2) = 0 and *q*_{1}(3) = 0. Before you check this, let me warn you that if you check it by expanding *q*_{1}(*x*) as *x*^{2} − 5*x* + 6 and then substituting *x* = 2 and *x* = 3, Descartes will turn over in his grave. The right way to see that *q*_{1}(2) is 0 is to plug *x* = 2 directly into the product (*x* − 2) (*x* − 3); then the first factor is 2 − 2, or 0, so the product must be 0, regardless of what the second factor is. Likewise, the right way to see that *q*_{1}(3) = 0 is to plug *x* = 3 directly into the product (*x* − 2) (*x* − 3). Unfortunately, *q*_{1}(1) isn’t 10; it’s (1 − 2) (1 − 3) = 2. But all you have to do to fix that blemish is multiply *q*_{1} by 5. If we put *p*_{1}(*x*) = 5 *q*_{1}(*x*) = 5 (*x* − 2)(*x* − 3), we get *p*_{1}(1) = 10, *p*_{1}(2) = 0, and *p*_{1}(3) = 0, just as we wanted. So we’ve found *p*_{1}(*x*).

Likewise, we can get *p*_{2}(*x*) by starting with *q*_{2}(*x*) = (*x* − 1) (*x* − 3) and multiplying by a suitable fudge factor (namely −9) to get *p*_{2}(*x*) = −9 (*x* − 1) (*x* − 3) satisfying *p*_{2}(1) = 0, *p*_{2}(2) = 9, and *p*_{2}(3) = 0. Lastly, *p*_{3}(*x*) = (13/2) (*x* − 1) (*x* − 2) satisfies *p*_{3}(1) = 0, *p*_{3}(2) = 0, and *p*_{3}(3) = 13. Putting it all together, we form

*p*(*x*) = *p*_{1}(*x*) + *p*_{2}(*x*) + *p*_{3}(*x*) = 5 (*x* − 2) (*x* − 3) − 9 (*x* − 1) (*x* − 3) + (13/2) (*x* − 1) (*x* − 2),

which (after you expand and recombine the terms) becomes (5/2)*x*^{2} − (17/2)*x* + 16. This is the famous method of Lagrange interpolation.

#4. What the forecasters did was not quite as dumb as fitting a third-degree polynomial to just four data points. Rather, they took a whole lot of data points and found the third-degree polynomial that fits the data as closely as possible. That’s less dumb, but when you’re crafting national policy in the face of a global health emergency, “less dumb” doesn’t cut it.

#5. You could solve the equation (5/2)*x*^{2} − (17/2)*x* + 16 = 10 by rewriting it as (5/2)*x*^{2} − (17/2)*x* + 6 = 0 and using the quadratic formula, but I prefer to use factoring. Since the left hand side of the equation equals 0 when *x* = 1, the Factor Theorem tells us (5/2)*x*^{2} − (17/2)*x* + 6 must factor as *x* − 1 times some linear polynomial *ax* + *b*. That is, we should be able to find constants *a* and *b* so that (*x* − 1) (*ax* + *b*) expands to give (5/2)*x*^{2} − (17/2)*x* + 6. (Notice how we’ve flipped Descartes’ script: *a* and *b* are the unknowns, not *x*!) Expanding (*x* − 1) (*ax* + *b*) gives *ax*^{2} + (*b* − *a*) *x* − *b*, which is equivalent to (5/2)*x*^{2} − (17/2)*x* + 6 provided that *a* and *b* satisfy the three equations *a* = 5/2, *b* − *a* = −17/2, and −*b* = 6. We can solve the first and last equations to get *a* = 5/2 and *b* = −6 and then check that these values satisfy the middle equation as well. So (5/2)*x*^{2} − (17/2)*x* + 6 factors as (*x* − 1) ((5/2)*x* − 6), telling us that the second root satisfies (5/2)*x* − 6 = 0, or (5/2) *x* = 6, or *x* = 6 / (5/2) = 12/5.

#6. Wilf was one of the tallest mathematicians I ever met, and his book had one of the shortest titles of any math book I ever read as measured by word-count, but that’s only because he cheated: his book is called generatingfunctionology instead of something more boring like “Theory and applications of generating functions”. You’ll notice that this month I’m only talking about polynomials, but Wilf’s book also talks about power series, which is the topic of next month’s essay. I should mention that the kind of generating functions Wilf focuses on in his book encode (ordered) sequences, whereas the kind I’m writiing about here encode (unordered) sets. Incidentally, Wilf’s name, profession, and stature provided the inspiration for the minor Sesame Street character Herb Wolf.

#7. Wilhelm Gottfried Leibniz, co-inventor of the calculus, asserted in his “Opera Omnia” that when you roll two dice, you’re just as likely to roll a 12 as you are to roll an 11, on the grounds that each outcome can be achieved in exactly one way: the former as 6+6, the latter as 5+6. From this we learn two things: first, that even great mathematicians make mistakes, and second, that Leibniz didn’t spend much time gambling. If Leibniz had spent some of his youth in gambling dens, he would’ve learned (possibly by losing his puffy shirt a few times) that an 11 comes up a lot more often than a 12, and if he’d read the writings of Cardano, Pascal, and Fermat he would have tabulated the 36 equally likely outcomes of rolling two 6-sided dice and he would’ve been able to check that an 11 is exactly twice as likely as a 12. Part of the conceptual difficulty here is that when you roll two dice at the same time, as opposed to rolling a single die twice, it’s harder to see why an 11 corresponds to two separate outcomes. If one die is red and the other is blue, then we can distinguish red-5-and-blue-6 from red-6-and-blue-5, but if the dice are hard to tell apart, then there may not be an easy way for us to tell apart the two outcomes; and if we can’t tell the difference, it’s hard to believe that Tyche, the goddess of chance, should care. But she does!

#8. Yes, I know it’s the name of a Laurie Anderson song.

#9. There are other ways to understand this fact. For instance, take the six-by-six addition table and color entries black if the sum is even and white if the sum is odd. Since each row has three white entries and three black entries, there are equally many white entries as black entries in the table as a whole. Or: notice that we can divide the table into nine 2-by-2 blocks, each of which contains equal numbers of black and white squares.

#10. The generating function for a single roll is *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5}, so the generating function for the sum of five rolls is (*x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5})^{5}. Plugging in *x* = −1, we get (−1 + 1 − 1 + 1 − 1)^{5} = (−1)^{5} = −1, so the negative coefficients in the expansion collectively overpower (just barely) the positive coefficients; the sum of the five numbers we roll is ever-so-slightly likelier to be odd than even. On the other hand, (−1+1−1+1−1)^{6} = (−1)^{6} = +1, so on the sixth roll the balance of power shifts; the sum of six numbers is ever-so-slightly likelier to be even than odd.

#11. So far we’ve looked at generating functions with a single variable *x* (usually called an indeterminate rather than a variable in this context). But generating functions needn’t be limited to a single indeterminate. Consider for instance the expression (*x* + *y*)^{2}, which expands as *x*^{2} + 2*xy* + *y*^{2}. Writing this as 1*x*^{2} + 2*xy* + 1*y*^{2}, we find that this polynomial is the generating function for the sequence 1, 2, 1. Likewise (*x* + *y*)^{3} is the generating function for the sequence 1, 3, 3, 1; (*x* + *y*)^{4} is the generating function for the sequence 1, 4, 6, 4, 1; and so on. These sequences are the rows of the famous triangle of binomial coefficients attributed variously to Pingala (India), Yang Hui (China), Omar Khayyam (Iran), Tartaglia (Italy), and Pascal (France), that starts like this:

It’s a curious fact that if you alternately add and subtract the elements in any row of the triangle other than the top row, you end up with zero. This isn’t so surprising when the second entry in a row is an odd number like 3 or 5, because then positive terms and negative terms cancel in an obvious way (as in 1 − 3 + 3 − 1). But it’s less obvious why we should have 1 − 4 + 6 − 4 + 1 = 0 and 1 − 6 + 15 − 20 + 15 − 6 + 1 = 0 and so on. But generating functions once again can help us. Consider (*x* + *y*)^{4} =1*x*^{4} + 4*x*^{3}*y* + 6*x*^{2}*y*^{2} + 4*xy*^{3} + 1*y*^{4}. Replacing *y* by −*y*, we get

(*x* − *y*)^{4} =1*x*^{4} − 4*x*^{3}*y* + 6*x*^{2}*y*^{2} − 4*xy*^{3} + 1*y*^{4}.

If we now set *x* = *y* = 1, the left hand side of the inset equation becomes 0^{4}, or 0, while the right hand side becomes 1 − 4 + 6 − 4 + 1, the desired alternating sum. The same trick works for (*x* − *y*)^{n} for any positive integer *n*. (Note, though, that the alternating sum of the entries in the top row of the triangle is 1, not 0. This accords with the fact that there are many mathematical situations, especially in discrete mathematics, where it makes more sense to define 0^{0} to equal 1 rather than 0.) This can be used to show that if you toss a coin one or more times, the probability that the number of heads is even is exactly 1/2, as is the probability that the number of heads is odd.

#12. Sicherman took to styling himself as The Colonel as a joke and many people have referred to him in print as such, mistaking the nickname for a military title.

#13. The polynomial *x*^{1} + *x*^{2} + *x*^{3} + *x*^{4} + *x*^{5} + *x*^{6} is actually a product of four irreducible polynomials (polynomials that cannot be factored further): (*x*) (1 + *x*) (1 + *x* + *x*^{2}) (1 − *x* + *x*^{2}). Combining the *x*, 1 + *x* + *x*^{2}, and 1 − *x* + *x*^{2} gives us the *x*^{1} + *x*^{3} + *x*^{5}; pairing up the 1 + *x* and 1 − *x* + *x*^{2} gives us the *x*^{0} + *x*^{3}, and pairing up the *x* and 1 + *x* + *x*^{2} gives us the *x*^{1} + *x*^{2} + *x*^{3}. It’s a theorem of advanced algebra that factorization of polynomials into irreducibles, like factorization of integers into primes, can be done in only one way. The polynomials 1 + *x*, 1 + *x* + *x*^{2}, and 1 − *x* + *x*^{2} are examples of cyclotomic polynomials; specifically, they are *Φ*_{2}(*x*), *Φ*_{3}(*x*), and *Φ*_{6}(*x*), where *Φ*_{n}(*x*) is the polynomial whose roots are precisely the primitive *n*th roots of 1 – that is, the complex numbers *z* that satisfy *z ^{n}* = 1 but don’t satisfy

#14. Since *A*(*x*) and *B*(*x*) are both divisible by *x*, it’s handy to pull out those factors of *x* and focus on the 5th degree polynomials *A**(*x*) = *A*(*x*)/*x* and *B**(*x*) = *B*(*x*)/*x*, and to write the equation as *A**(*x*) *B**(*x*) = (1/11) (1 + *x* + *x*^{2} + ··· + *x*^{10}). *A**(*x*) is a 5th degree polynomial, and every polynomial of odd degree has a real root, so there exists a real number *r* such that *A**(*r*) = 0, implying that *A**(*r*) *B**(*r*) = 0. If we had *A**(*x*) *B**(*x*) equal to (1/11) (1 + *x* + *x*^{2} + ··· + *x*^{10}), then, plugging in *x* = *r*, we’d get (1/11) (1 + *r* + *r*^{2} + ··· + *r*^{10}) = 0. Multiplying the equation by 11 and then by 1 − *r*, we get (after lots of cancellation) 1 − *r*^{11} = 0. So *r* must satisfy 1 − *r*^{11} = 0, and the only real number *r* with that property is *r* = 1. Since we assumed *A**(*r*) = 0, we must have *A**(1) = 0. But *A**(1) isn’t 0; in fact it’s *a*_{1} + *a*_{2} + ··· + *a*_{6}, which is 1, not 0.

This proof won’t work for *n*-sided dice when *n* is odd, but in this case complex numbers can come to our rescue. I’ll take *n* = 5 for definiteness. We are looking for polynomials *A*(*x*) =* a*_{1} *x*^{1} + *a*_{2} *x*^{2} + *a*_{3} *x ^{3}* +

We mathematicians have nobody but ourselves to blame, since it was one of our own (René Descartes) who saddled numbers like sqrt(−1) with the term “imaginary” and another mathematician (Carl-Friedrich Gauss) who dubbed numbers like 2+sqrt(−1) “complex”. Now it’s several centuries too late for us to ask everybody to use different words. But since those centuries have given us a clearer understanding of what these new sorts of numbers are good for, I can’t help wishing that, instead of calling them “complex numbers”, we’d called them — well, I’ll come to that in a bit.

Mind you, I totally get why sqrt(−1) got called imaginary. “sqrt(−1)” signifies a number *x* with the property that *x*^{2} = −1, but no respectable number behaves that way. A law-abiding number is positive, negative, or zero. If *x* is positive, *x*^{2} will be positive too. If *x* is negative, *x*^{2} will still be positive, since a negative number times a negative number is a positive number (see my essay “Going Negative, part 1” and other Mathematical Enchantments essays about negative numbers if you’re wondering why the product of two negative numbers is positive). And if *x* is zero, *x*^{2} will be zero. In none of the three allowed cases is *x*^{2} negative, so you can’t have *x*^{2} equal to −1. Sorry; it’s an impossible equation. And you might think that *that* would end the matter …

… except that five hundred years ago algebraists learned, to their astonishment, that expressions involving the square roots of negative numbers can be useful intermediate stages of certain calculations that have sensible final answers. So mathematicians grudgingly invited square roots of negative numbers into the house of mathematics but only through the back door, and chose nomenclature that would let those impossible square roots know in no uncertain terms that they were second-class citizens who could mix with other algebraic expressions but who needed to be out the door when their work was done, before respectable company arrived. (See Endnote #1.)

**MATH AND MYSTICISM**

Isaac Asimov, in his essay “The imaginary that isn’t” (from his book *Adding A Dimension*), describes an encounter he had with a sociology professor while he was an undergraduate in the 1930s. The professor sorted humankind into two groups, “realists” and “mystics”, and asserted that mathematicians belong to the latter camp because “they believe in numbers that have no reality.” When young Asimov asked him to explain, the professor cited the example of the square root of minus one, saying “It has no existence. Mathematicians call it imaginary. But they believe it has some kind of existence in a mystical way.” Asimov protested that imaginary numbers are just as real as any other kind of number, and the professor challenged the upstart to hand him the square root of minus one pieces of chalk, and … but I’ll break off the story there for now, because I like to imagine it going a different way: I like to imagine young Asimov asking, “So, where would you put electrical engineers in your classification of humankind? You know, people like Steinmetz?”

Any professor teaching in an American college in the 1930s would have known of Charles Proteus Steinmetz, even though he’s no longer a household name the way Edison and Tesla are. The “Wizard of Schenectady” was as responsible for the electrification of America as anyone else (arguably more than Edison, who had stubbornly insisted on trying to transmit direct current along power lines until Steinmetz and Tesla and their allies proved the superiority of alternating current). The sociology professor was undoubtedly teaching in a classroom that had artificial light in the ceiling run by electrical generators miles away, thanks to Steinmetz.

“Steinmetz built things. He was a realist,” the professor would have said.

“Oh?” Asimov could have replied. “Then why was he an evangelist for the square root of minus one?”

**THE WIZARD OF SCHENECTADY**

Steinmetz was in many ways Edison’s opposite, and not just because of their different ideas about how power should be transmitted across long distances. Edison was a commanding five foot ten; Steinmetz was only four feet tall. Edison gave his assistants (“muckers”, they were called) puny salaries; Steinmetz once refused to take a raise from his employer because he felt that his assistants weren’t paid enough. Edison had three biological children; Steinmetz never had any because he was determined not to pass on his genes for kyphosis and hip dysplasia, opting instead to adopt a younger colleague as his son and become a loving grandfather to the colleague’s children. But, like Edison, Steinmetz was a workaholic whose success in solving technological problems came in part from the fact that he devoted his life to them.

As a young man in Bismarck’s Prussia, Karl August Rudolph Steinmetz joined a fraternity that bestowed on him the nickname Proteus after the shape-shifting Greek sea-god from whose name we get the adjective “protean”. Later, his membership in a socialist student group got him in trouble with the authorities and he was forced to flee the country. As a disabled person with very little command of English, he was almost turned away at Ellis Island until a friend spoke up for him, exaggerating his talents in a way that his past accomplishments didn’t justify (but his future accomplishments would). Young Karl became Charles and adopted his former nickname as his legal middle name. He went to work for a friend, Rudolph Eickemeyer, and stayed at Eickemeyer’s company out of loyalty even after his early achievements got the attention of bigger companies. When General Electric offered him a large salary increase if he’d leave his friend’s company, Steinmetz was puzzled: what did salary have to do with the principle of loyalty? General Electric resolved the impasse by buying Eickemeyer’s company.

The late 19th century was an era of technological promise, much of it bound up in the transforming potential of electricity. The major problem was how to get electricity to all the different places where it could do its magic. Edison favored the straightforward approach of pushing electrons over wires from point A to point B, since electricity derives from the motion of electrons. But others favored the less intuitive idea of rocking electrons back and forth along a wire, alternately pushing and pulling. That kind of current, an alternating current, has many technological advantages (which is why it predominates today), but alternating current is harder to model mathematically because it’s dynamic in a way that direct current isn’t. Let’s take a look at that.

First, picture a simple direct current (DC) circuit containing a battery, a lightbulb, and two wires connecting the battery to the bulb in both directions. Three important quantities are the voltage *V* (the difference in electrical potential between the two ends of the wire), the current *I* (how many electrons are traveling along the wire), and the resistance *R* (how hard the electrons have to work to make the trip), and as long as the bulb doesn’t burn out, the quantities are constant. If you were to plot current and voltage as functions of time (with time on the horizontal axis and current or voltage on the vertical axis), you’d just see boring horizontal lines. Moreover, there’s a simple equation relating these three quantities, called Ohm’s Law: *V* = *I R*.

But in alternating current (AC) circuits, voltage and current vary over time, and there’s no such simple linear relationship between them. For instance, if you live in the U.S., an outlet labeled “120V” is actually giving you a voltage that oscillates between +170V and −170V; 120 is just the average over time (the “root-mean-square average“, for those of you who care). In the figure below I give a typical plot, showing voltage (the blue curve) and current (the gold curve) as functions of time in an AC circuit. Sometimes the voltage is increasing and the current is increasing, but sometimes the voltage is increasing and the current is decreasing. Both voltage and current follow the pattern of a sine wave, or “sinusoid”, but typically peak current doesn’t coincide with peak voltage; in most circuits there’s a mismatch, or “phase shift”. (See Endnote #2.) Gone is the simplicity of *V* = *I R*.

You can see this sort of phase-shift in action if you watch a kid on a swing (after they stop pumping) and you simultaneously attend to position and speed, or more precisely, deflection from the vertical and angular velocity. When the kid is as far to the right as possible, their speed is (instantaneously) zero. When the kid has swung back down to the lowest point of their trajectory (where the deflection is zero), the motion is leftward and the speed is at its maximum. When the kid is as far to the left as possible, their speed is again zero. When the kid returns to the zero-deflection point, the motion is rightward and the speed is at its maximum. And so on. The instant of maximum rightward deflection comes after the instant of maximum rightward speed, specifically 1/4 of a cycle later.

If you were to plot, at each instant, a point whose *x*-coordinate is the deflection of the swing (positive when the swing is on the right, negative when the swing is on the left) and whose *y*-coordinate is the angular velocity of the swing (positive when the swing is moving rightward, negative when the swing is moving leftward), the moving point would trace out a circle, as shown in the illustration. The circle doesn’t exist in physical space; rather, it exists in a notional “phase space”, in which the vertical axis is for the* velocity* of the swing, not its position. (Here I’m ignoring some niceties about pendular motion; specifically, its deflection-velocity plot is only approximately circular, and only when the amplitude is small. And as every kid who’s been on a swing knows, it’s the big deflections that are the fun part!)

The mathematics of circular motion is usually described using trigonometric functions, and indeed one can describe current and voltage in alternating-current circuits using sines and cosines, but the formulas can get quite hairy. What Steinmetz realized is that some seemingly pure math he’d learned in his student days could make the formulas much simpler. (See Endnote #3.)

**PURE IMAGINATION**

Mathematicians had played with imaginary and complex numbers long before their games had any real-world applications. One of the mathematicians who played the hardest was Leonhard Euler, who in 1777 introduced the symbol “*i*” to signify the square root of minus one. Euler operated on the assumption that whatever “*i*” might be, it should satisfy the ordinary rules of algebra. So for instance 2*i* times 3*i* should be

(2*i*)×(3*i*) = (2)×(*i*)×(3)×(*i*) = (2)×(3)×(*i*)×(*i*) = (2×3)×(*i*×*i*) = (6)×(−1) = −6

and 1+*i* times 1+*i* should be

(1+*i*)×(1+*i*) = 1×1 + 1×*i* + *i*×1 + *i*×*i* = 1 + *i* + *i* + −1 = *i* + *i* = 2*i*

(where the first equality is an application of the distributive law or if you prefer “FOIL“; but see Endnote #4).

Later, the mathematicians Jean-Robert Argand, Caspar Wessel and Carl-Friedrich Gauss independently came up with a visual way to represent complex numbers. You draw a horizontal axis for the real numbers and a vertical axis for the imaginary numbers meeting at a point called the origin, and you depict the complex number *a*+*bi* by a point that’s *a* units to the right of the origin and *b* units up from the origin, as shown for the complex numbers 2+*i*, 3+*i*, and 5+5*i*. (If *a* is negative, go left instead of right; if *b* is negative, go down instead of up.) Note by the way that the origin represents the complex number 0+0*i*, whch is simultaneously real and imaginary. My daughter, shortly after learning all this, exclaimed “Wait, so zero has been a complex number under my nose this whole time?” Absolutely!

The wonderful thing about the definition of complex number multiplication is the geometry that’s hiding inside it (exactly the kind of shape-shifting geometry Steinmetz needed). Suppose a specific complex number *a*+*bi* other than 0+0*i* is represented by the specific point *P* in the plane in the manner described above. Let *O* stand for the origin (where the axes cross, aka 0+0*i*), and let *N* be the point on the horizontal axis that corresponds to the complex number 1+0*i*, aka the real number 1. We define the “magnitude” of the complex number *a*+*bi* as the length of segment *OP*, and we define the “phase” or “angle” of the complex number *a*+*bi* as the measure of angle *NOP* (some people call the phase the “argument” but I won’t). For example, when *a*=*b*=1, triangle *NOP* is an isosceles right triangle with legs of length 1, so the magnitude of 1+*i* is sqrt(2) and the phase of 1+*i* is 45°.

The miracle is that if we define multiplication of complex numbers in the way that the ordinary rules of algebra force us to, then **magnitudes multiply and angles add**! For instance, look at 2*i* times 3*i*. 2*i* has magnitude 2 and phase 90°, and 3*i* has magnitude 3 and phase 90°; the product of 2*i* and 3*i*, namely −6, has magnitude 6 = 2 × 3 and phase 180° = 90° + 90°. Or look at 1+*i* times 1+*i*. 1+*i *has magnitude sqrt(2) and phase 45°; its product with itself, namely 2*i*, has magnitude 2 = sqrt(2) × sqrt(2) and phase 90° = 45° + 45°. (A puzzle for some of you: can you show that the “multiplication miracle” holds when the two complex numbers being multiplied are 2+*i* and 3+*i*? See Endnote #5.)

Multiplying a complex number by −1 has the effect of leaving the magnitude alone while rotating the corresponding point halfway around the origin. (In fact, the rule for multiplying complex numbers gives us a new way to understand the rule for determining the sign of the product of two real numbers; see Endnote #6.) In a similar way, multiplying any complex number by *i* has the effect of rotating the corresponding point a quarter of the way around the origin, in the counterclockwise direction; see Endnote #7. Steinmetz realized that the mathematics of multiplication by *i* was a very crisp way of representing the physics of a 90 degree phase shift. (See Endnote #8.) He couldn’t use the letter *i* because electrical engineers were already using *I* to represent current flow (recall our earlier equation *V* = *I R*), so Steinmetz chose to use *j* instead, and to this day many electrical engineers use *j* instead of *i* to signify the square root of −1.

The sociology professor’s mistake (back in Asimov’s story) lay in part in thinking that mathematics is only about static quantities, such as the number of pieces of chalk you’re holding in your hand. But math can also be about things that, like mythical Proteus, keep changing. Take the professor’s chalk and drag it against a blackboard, at an angle that makes a squeaky sound some find painful, and you have an oscillating physical system that could be described by sines and cosines but might better be described through the use of complex numbers. Indeed, the 90 degree phase shift (of which the complex number *i* is the numerical thumbprint) is ubiquitous in physics. I’ve already mentioned electrical circuits and kids on swings, but there are lots of other examples.

Real numbers have magnitude and sign; analogously, complex numbers have magnitude and phase. That’s why I wish complex numbers had been dubbed “phased numbers”. (See Endnote #9.) Real numbers are phased numbers whose phase is either 0 degrees (for positive real numbers) or 180 degrees (for negative real numbers). Likewise, imaginary numbers are phased numbers whose phase is either 90 degrees or 270 degrees.

I should stress that Steinmetz did not experimentally discover that the flow of electrons had a hitherto unnoticed imaginary component — he merely showed that the mathematical formalism of electrical engineering becomes simpler if we *pretend* that the current that we measure is but the shadow, along the real line, of a quantity whose true home is the complex plane. The gif Another way of looking at sine and cosine functions (created by Christian Wolff; permission pending) illustrates this in a lovely way. A green point moves in a circle in the *x*,*y* plane. Its projection onto the *x*,*z* plane gives a blue sinusoid, while its projection onto the *y*,*z* plane gives a red sinusoid that is 90 degrees out of phase with the first one. In our analogy, one of these sinuoids is real current (or real voltage) while the other sinuoid is imaginary current (or real voltage). If we apply this point of view to our AC circuits, then we can revive the equation *V* = *I R* by reinterpreting all three quantities as complex numbers. *V* now represents the complex voltage, *I* now represents the complex current, and *R* gets replaced by a complex number called “impedance” that extends the concept of resistance. (See also Endnote #10.) The linear relation between current and voltage, so handy in the study of direct current circuits, has been restored! And the only price we had to pay was to graduate from the real number line to the complex number plane.

The mathematics of back-and-forth in one dimension is best expressed in terms of the mathematics of round-and-round in two dimensions. For instance, when you spin this disk clockwise (see Endnote #11), the vertical coordinates of the blue and gold points match up with the behavior of the blue and gold curves I used to illustrate the behavior of voltage and current in an AC circuit. And as the disk spins about its center, the ratio of the gold point to the blue point remains fixed if we view the two points as complex numbers, since the ratio of their magnitudes stays 4-to-3 while their phase-lag stays 90 degrees.

(If any of you reading this are good at creating animations, please let me know; I’d love to be able to include a gif that shows the spinning disk and makes the connection with those sinusoids “pop”!)

Since complex currents and complex voltages are useful fictions, not scientific facts, perhaps the sociologist was right to call this way of thinking about the world “mystical”. But if so, Steinmetz was an extremely unusual and useful kind of mystic: not the kind who makes occult pronouncements about the spirit plane but the kind who, invoking a different sort of plane, brings about a world in which it’s easier to make toast.

**THE GREAT UNIFICATION**

If you’re impatient to get back to Asimov’s sociology professor and find out what really happened in that classroom, you might want to skip this section. But I can’t resist giving you a peek into what mathematicians did with Euler’s *i*, starting with Euler himself. (This is the stuff Steinmetz would have learned as a math major at the University of Breslau.)

The best thing Euler did with the number *i* was discover the equation

*e ^{iθ}* = cos

where *e* is the constant 2.718… discovered by Jacob Bernoulli but often called Euler’s number, cos is the cosine function, and sin is the sine function. (For more about *e*, watch the 3Blue1Brown video “What’s so special about Euler’s number e?”, and for more about Euler’s amazing formula, watch the 3Blue1Brown video “What is Euler’s formula actually saying?”.) What makes this equation astonishing is that the left and right sides of this equation come from different worlds. The left side is an exponential function (if we leave aside the suspicious circumstance of the exponent being an imaginary number), and therefore points at phenomena like compound interest, population growth, radioactive decay, and the initial spread of novel pathogens. (It was indeed the application of exponential functions to banking that led Bernoulli to discover *e* in the first place.) Meanwhile, the right hand side (again ignoring the *i*) features two functions, sine and cosine, introduced thousands of years ago for surveying land, navigating seas, and plotting the paths of planets. It would seem that the compounding of interest has little to do with the motions of heavenly bodies, yet Euler’s formula tied them together intimately, showing them to be two different aspects of a single mathematical phenomenon. We celebrate Newton’s unification of terrestrial ballistics with the motion of the Moon, and Maxwell’s unification of electricity, magnetism, and light, but we don’t say nearly enough about how Euler’s discovery built a secret passageway that links numerous disciplines within mathematics and far beyond.

The complex number *e ^{iθ}* always lies on the circle of radius 1 centered at 0. If we want to talk about other kinds of nonzero complex numbers, we use the representation

One picayune but useful consequence of Euler’s monumental discovery is that you don’t have to memorize many trig formulas once you know how to traverse the passageway between the world of exponential functions and the world of trig functions; see Endnote #12. You can also use complex numbers to get algebraic proofs of certain geometric facts (see Jim Simons’ video “Three pretty geometric theorems, proved by complex numbers”) and to find nice solutions to combinatorial puzzles (see Endnote #13 as well as the 3Blue1Brown video “Olympiad-level counting”) and sometimes to reduce nasty-looking geometric optimization problems to manageable complexity (see Endnote #14).

More profound applications of the complex numbers turned up in 19th century mathematics, especially Bernhard Riemann’s work in number theory, leading French mathematician Paul Painlevé to write “Between two truths of the real domain, the easiest and shortest path quite often passes through the complex domain.” (The saying was popularized by Jacques Hadamard through his book *The Psychology of Invention in the Mathematical Field*, in which he prefaced the adage by “It has been written…” without acknowledging Painlevé as the source.)

Even though the advent of complex numbers led to new beginnings in many branches of mathematics, in an important way, it was an ending too. Earlier math had been full of equations whose solutions seemed impossible but which led to new kinds of numbers. Want to solve 2*x* = 1? Invent fractions. Want to solve *x*+2 = 1? Invent negative numbers. Want to solve *x*^{2} = 2? Invent irrational numbers. Want to solve *x*^{2} = −1? Invent imaginary numbers. You might think we could keep at this game forever, writing down impossible equations and then inventing new numbers to render the impossible possible.

For instance, what about the equation *x*^{2} = *i*? You might think we need to go beyond the complex numbers to solve it. But we don’t, and if you remember how multiplication of complex numbers works, it’s not hard to figure out where the square root of *i* is hiding in the complex plane: it has magnitude 1 and phase 45°. That is, it’s *r* + *ir* where *r* = sqrt(2). (There’s also a square root of *i* with phase 225° on the other side of the origin.) So we can solve *x*^{2} = *i* in the complex number system without us having to bring new numbers into the game.

This is just one example of a theorem that’s so important that it’s called the Fundamental Theorem of Algebra: if you write down an equation (more specifically a polynomial equation) involving a single unknown number *x*, that equation (unless it reduces to something silly like *x* = *x*+1) will always have a solution in the system of complex numbers. So you could say that with the advent of complex numbers, the discipline of algebra, after many centuries of wandering and struggle, had finally found its true home.

**WHAT IS REAL?**

But wait — I sense a presence in the room: it’s the spirit of Asimov’s professor, and he’s downright gleeful. “All you’ve proved is that a spirit of mysticism has infected the world of science, thanks to the closet-mathematician Steinmetz and other traitors to Reality!” And (though it’s galling to have a ghost lecture me about mysticism versus realism) I have to admit he has a point. Steinmetz identified as a mathematician in his younger years, before he came to America and switched to engineering. And the “infection” has continued to spread.

At first the use of complex numbers was confined to branches of physics that studied wavelike phenomena. If you want to understand how light works in classical optics, you need to think of a photon as a kind of self-sustaining feedback loop between an electrical oscillation and a magnetic oscillation, propagating through space. To understand this screwlike motion, you need the twisty mathematics offered by complex numbers.

Then in the first half of the 20th century came the quantum revolution. Physicists came to realize that elementary particles (and to some extent objects made up of those particles, even including macroscopic ones like pieces of chalk) had a wavelike aspect, and that certain phenomena could only be understood if you treated complex numbers not just as a useful fiction but as part of the bedrock of reality.

An elementary particle, viewed as a wave, has a phase, and we can experimentally measure how particles’ phases change when they interact. Probabilities don’t just add; sometimes they cancel, interfering destructively the way waves do. (See Endnote #15.) Quantum physics has phase baked into its structure at the smallest scales that our current theories can reach. It’s not just light that behaves in a screwy way; quantum physics asserts that the whole damned universe is screwy, which is why we need twisty mathematics to describe it.

There’s another sense in which the sociology professor was sort of right (though several centuries behind the times): complex numbers did arise from an approach to math that renounced the physical world and even common sense. The 16th century algebraist Gerolamo Cardano, after deriving the complex roots of the “impossible” equation *x*^{2}−10*x*+40=0, declared his own analysis to be “as subtle as it is useless”. Rafael Bombelli, building on Cardano’s work, made complex numbers more respectable by giving clear and consistent rules for operating with them, but he never attempted to explain what complex numbers *were*. (See Endnote #16.)

Hiding in the background of Bombelli’s work was the radical notion (announced more overtly in 19th century England) that if you give clear and consistent rules for operating with fictional quantities, then you can study those fictional quantities on their own terms as elements of a notional number system, deferring or dismissing the question of what those quantities actually mean. This gives license to a sort of “mysticism” in which mathematicians create new number-systems simply by specifying rules of operation, not worrying about whether or how these number-systems correspond to anything in the real world. Maybe there’ll be an application in a hundred years, or a thousand, or never; who knows? In the meantime, there’s plenty of exploring to do.

In Asimov’s anecdote, when the professor challenges Asimov to hand him the square root of minus one pieces of chalk, the brash undergrad says he’ll do it if the professor first gives him one-half of a piece of chalk. When the professor breaks a piece of chalk into two pieces and gives one to Asimov, saying “Now for your end of the bargain,” Asimov points out that what he’s been handed is a single (smaller) piece: *one* piece of chalk, not one-half. The professor counters that “one-half a piece of chalk” means one half of a *standard* piece of chalk, and Asimov asks the professor how he can be sure that it’s exactly half, and not, say, 0.48 or 0.52 of a standard piece of chalk.

What I take away from the end of Asimov’s story is that the difference between a “concrete” number like one-half and an “abstract” number like the square root of minus one is a difference in degree, not a difference in kind. Both are useful fictions. The fictional aspect of one-half comes into view when we notice that the professor’s attempt to hand Asimov half a piece of chalk depends on both a societal agreement on what a standard piece of chalk is and a societal agreement about how much error is permitted. The latter is a bit hazy; where do we draw the line between dividing something in half and dividing it into two unequal pieces? Come to think of it, I’m sure there are measurable differences between the different pieces of chalk that come out of a chalk factory. Quality control doesn’t require that the differences be indiscernible. So the definition of a “standard” piece of chalk is a bit fuzzy too.

Of course I’m splitting hairs here, and ordinary conversation demands adherence to a largely unspoken agreement about which hairs to leave unsplit. And that indeed is my point. Even a seemingly simple mathematical concept like one-half is a collaboration between the universe and a society of minds observing the universe — just like the square root of minus one.

Or to put it more succinctly: Real numbers are more imaginary than most people realize, and imaginary numbers are more real than most people imagine.

*Thanks to Richard Amster, Sidney Cahn, Jeremy Cote, Sandi Gubin, Henri Picciotto, Tzula Propp, and Paul Zeitz.*

**REFERENCES**

Titu Andrescu and Zuming Feng, *102 Combinatorial Problems From the Training of the USA IMO Team*, 2003.

Isaac Asimov, *Adding A Dimension*, 1964.

Floyd Miller, *The Man Who Tamed Lightning: Charles Proteus Steinmetz*, 1965.

Paul Nahin, *An Imaginary Tale: The Story of sqrt(−1)*, 1998.

David Richeson, The Scandalous History of the Cubic Formula, Quanta Magazine, 2022.

Danny Augusto Vieira Tonidandel, Steinmetz and the Concept of Phasor: A Forgotten Story, 2013.

Paul Zeitz, *The Art and Craft of Problem Solving*, 2006.

**ENDNOTES**

#1. Specifically, one method for solving the equation *x*^{3} = 15*x* + 4 involves writing the solution in the form

before rewriting it as

cancelling the two impossible terms of opposite sign, and concluding (correctly) that *x*=4 solves the problem. See the Veritasium video “How Imaginary Numbers Were Invented” as well as David Richeson’s Quanta Magazine article listed in the References for more on this.

#2. This picture is potentially misleading since current and voltage are measured in different units; superimposing them has no physical meaning. However, it’s still a helpful way to compare the phases of the two quantities. In the illustration, current lags behind voltage by one-quarter of a cycle, which is what happens when your only circuit elements are capacitors. The phase shift when your only circuit elements are inductors is also one-quarter of a cycle, but in the opposite direction, with voltage lagging behind current. For circuits that contain both capacitors and inductors, things get complicated; more specifically, as Steinmetz noticed, they get complex!

#3. The two functions I chose for the voltage and current in my figure depicting alternating current were 4 cos *t* and 3 sin *t*, two sine waves that are 90 degrees out of phase. Ignoring the fact that one of them represents a voltage and the other represents a current, let’s add them. Here’s a graph of the function 4 cos *t* + 3 sin *t*:

Notice that we get another sine-wave, but it’s in phase with neither 4 cos *t* nor 3 sin *t*. Interestingly, if you were to measure the amplitude of this function — the sum of a sine wave of amplitude 4 and a sine wave of amplitude 3 — you’d find that it’s exactly 5. And if you suspect that this equality has something to do with the 3-4-5 right triangle, then (just like the triangle) you are right! The crucial fact is that the two sine waves being combined were exactly 90 degrees out of phase with each other. If we’d added two sine waves that were in phase with each other, one of amplitude 4 and the other of amplitude 3, we’d get a sine wave of amplitude 4+3=7 because of constructive interference. In the opposite case, where the waves are 180 degrees out of phase with each other, we’d get a sine wave of amplitude 4−3=1 because of destructive interference. And in the intermediate case, where the waves are 90 degrees out of phase with each other, we get a sine wave of amplitude 4+3*i* or rather sqrt(4^{2}+3^{2}) = 5 (the magnitude of 4+3*i*). Sine waves, unlike pieces of chalk in a classroom, can interfere with each other constructively or destructively or in an intermediate manner.

#4. Some computer programmers who implement complex number arithmetic use a slight variant of the formula. On computers, multiplications are more time-consuming (or as one says “expensive”) than additions, so one often focuses on reducing the number of multiplications even if the number of additions is increased. Let *E* = *ac*, *F* = *bd*, and *G* = (*a*+*b*)(*c*+*d*). Then clearly *E*−*F* is *ac*−*bd* and you can check that *G*−*E*−*F* is *ad*+*bc*. So a computer can calculate the real and imaginary parts of the product of two complex numbers using just three real multiplications rather than the obvious four real multiplications.

#5. The magnitudes of 2+*i* and 3+*i* are respectively sqrt(2^{2}+1^{2}) = sqrt(5) and sqrt(3^{2}+1^{2}) = sqrt(10), whose product is sqrt(50), which is the magnitude of 5+5*i*, which is the product of 2+*i* and 3+*i*. Likewise, using some trigonometry you can show that if *𝛼* is the phase of 2+*i* and *𝛽* is the phase of 3+*i*, then 𝛼 plus 𝛽 is exactly 45 degrees, which is the phase of 5+5*i*. One way to prove this is to use the tangent addition formula: we know that tan *𝛼* is 1/2 and tan *𝛽* is 1/3, so tan 𝛼+𝛽 is

implying that * 𝛼*+

#6. Positive numbers have phase 0 degrees and negative numbers have phase 180 degrees. So the rule for the sign of a product of real numbers as embodied in the table

is essentially the same as the rule for adding angles that are multiples of 180 degrees as embodied in the table

#7. I’ve heard of an intriguing bit of kinesthetic pedagogy that Michael Pershan and Max Ray-Riek developed as a way of informally introducing middle-schoolers to complex numbers. Kids arranged on a number line can be led to invent 90 degree rotation as a choreograpic enactment of multiplication by the square root of −1! Visit Henri Picciotto’s page Kinesthetic Intro to Complex Numbers to learn more. Teachers and students may also find other things of interest at Picciotto’s more advanced page for complex number pedagogy.

#8. Like most stories that shine a spotlight on a single pioneering innovator, my story leaves out a lot. Steinmetz wasn’t the first or only person to suggest using complex numbers to understand electrical circuits involving alternating current; several people had the idea independently at about the same time. But Steinmetz was the chief proponent of this method in the U.S., and in his writings he compellingly demonstrated its virtues.

#9. Gauss called numbers like 2+3*i* “complex” because of the way they are compounded of a real part (2) and an imaginary part (3*i*). This terminology stresses the additive side of complex numbers, that is, the way you can build them up by adding simpler components together. But that doesn’t tell us anything interesting about what complex numbers are or what they’re good for. Vectors (which we’ll meet in a later essay) are also built by adding simpler components together. For that matter, I could introduce “fruity numbers” like “2-apples-and-3-bananas”, and I could say things like “2-apples-and-3-bananas plus 5-apples-and-7-bananas equals 7-apples-and-10-bananas”, and I could represent fruity numbers using two-dimensional diagrams; then fruity numbers would look a lot like complex numbers and they’d behave the same way vis-a-vis addition, but they’d be very different from complex numbers. What’s distinctive about the complex numbers (as opposed to the fruity numbers) is the specific, meaningful way in which one can multiply them: when you multiply two complex numbers, the phases get added. That’s why I think “phased numbers” is a better name for them.

#10. Going back to the example with *V*(*t*) = 4 cos *t* and *I*(*t*) = 3 sin *t* (the first picture in the essay), we find that the complex voltage 4 *e ^{it}* is equal to the complex current 3

#11. The convention of trigonometry is that counterclockwise is the *positive* direction and clockwise is the *negative* direction. I suppose we mathematicians could try to convince the world to use clocks that go the other way, but it’s a hard sell; I expect we’ll have to wait a very long time before this happens.

#12. See for instance the video “Double Angle Identities Using Euler’s Formula“. Some may say “That’s a lot of algebra; isn’t it easier just to look it up?”, but I’m not the only mathematician I know who’d rather re-derive a trig identity via complex exponentials.

#13. Here’s a puzzle of mine that Paul Zeitz used in his book of contest problems. Given a circle of *n* lightbulbs, exactly one of which is initially on, you’re allowed to change the state of a bulb (on versus off) provided you also change the state of every *d*th bulb after it (where *d* is a divisor of *n* other than *n* itself), provided that all *n*/*d* of the bulbs were originally in the same state as one another (that is, all on or all off). For what values of *n* is is possible to turn all the lights on by making a sequence of moves of this kind?

For example, take *n*=12. We have 12 lights in a circle, one of which is on. You’re allowed to toggle 2, 3, 4, 6, or 12 bulbs from off to on (provided that they’re evenly spaced around the circle), and you’re also allowed to toggle 2, 3, 4, 6, or 12 bulbs from on to off (provided that they’re evenly spaced around the circle). Taking as many moves as you need, can you get all the lights to be on? If that’s too hard, can you get all the lights to be off? Or if that’s still too hard, can you get there to be exactly one light on, but it’s a different light than the one that was on at the start?

If you like puzzles, this may be a good time to stop reading and think for a bit.

All of these tasks are impossible, and not just for *n*=12. And we can prove it with complex numbers, provided we know one key fact: if you have two or more evenly-spaced points on the circle of radius 1 in the complex plane, their sum is 0. I won’t prove the fact here, but let’s see how it shows that the lights puzzle can’t be solved. The trick is to look at the *sum* of the positions of the bulbs that are on, using complex number addition. The sum starts out being nonzero because exactly one light is on and (in the original version) the sum is supposed to end up being zero because all the lights should end up being off when we’re done. But anytime you turn a bunch of lights on, they’re evenly spaced, so the sum of their positions is zero, which means that when you turn those lights on you’re not affecting the sum of the turned-on lights. Likewise, when you turn a bunch of lights off, they’re evenly spaced, so the sum of their positions is again zero, and when you subtract zero from a complex number, you don’t change it.

#14. Here is a special case of a new problem I call the “repelling propellers problem”. I place blue dots at the 12 o’clock, 3 o’clock, 6 o’clock, and 9 o’clock positions on a circle. I want to place three red dots on the circle as well, 120 degrees apart from one another, in such a way as to maximize the product of all twelve of the red-point-to-blue-point distances. How do I do it? It looks nasty; there are twelve point-to-point distances to be multiplied, and each of them will be something like a trig function or involve a square root if we adopt a straightforward approach. But complex numbers yield a nice solution. Here again, you might want to stop reading and think on your own for a bit (though you’ll need to know some things about complex numbers that aren’t explained in my essay).

Let the circle in question be the unit circle in the complex plane, so that the blue points are at 1, *i*, −1, and −*i*. Let *ω* be cis 120° so that *ω*^{2} is cis 240°; if *z* represents the position of one red point, the other red points are at *ωz* and *ω*^{2}*z*. (You can think of the seven points as the tips of propellers that rotate around 0, with a repelling force between the red propeller tips and the blue propeller tips.)

The distance between two complex numbers *𝛼* and

Since the magnitude of a product of complex numbers equals the product of the magnitudes and vice versa, we can rewrite this expression as the magnitude of the product

But this product (call it *P*(*z*)) can be written in a much simpler way. To see how, consider the values of *z* that make the product equal to 0. Go back to the geometrical problem but consider the reverse desideratum: if you want to make the product of the twelve distances as *small* as possible, the best places to put those red dots are at 12 o’clock, 1 o’clock, 2 o’clock, …, 10 o’clock, and 11 o’clock, because for all of those positions, there’ll be a red dot and a blue dot that coincide and so are at distance zero from each other, which makes the product of the twelve distances all equal to zero as well. That means that the twelve complex numbers cis 0°, cis 30°, cis 60°, …, cis 300°, and cis 330° are all roots of the degree-twelve polynomial *P*(*z*). But those complex numbers are just the roots of the equation * z^{12}* − 1 = 0. So

We’re making progress here, and though you may be worried about the fact that I haven’t worked out what *C* is, you’ll soon see that we don’t need to know it. |*P*(*z*)|, the quantity we need to maximize, is |*C* (*z*^{12} − 1)|, which equals |*C*| |*z*^{12} − 1|, and |*C*| is constant, so all we’re really trying to do is maximize |*z*^{12} − 1|, which is the distance between *z*^{12} and 1. That is, we want to find a point *z* on the circle of radius 1 that makes *z*^{12} as far from 1 as possible. But as *z* varies over the circle of radius 1, so does *z*^{12}, and the point on this circle that’s as far from 1 as possible is the point −1. So we need *z*^{12} = −1, which we can achieve (for instance) with *z* = cis 15°, placing red dots at 2:30, 10:30, and 6:30. In fact, if you place a red dot halfway between any two consecutive hour-marks, and place the other two dots accordingly, you’ll get one of the four dot-configurations that maximizes the product of the red-to-blue distances. (If you’re curious, placing the dots in this way makes the product of all twelve of those distances equal to exactly 2. But my problem didn’t ask you to figure that out.)

In a more general version of the problem, there are two propellers, one with *p* evenly spaced blades and one with *q* evenly spaced blades. If you can solve the *p*=3, *q*=4 case, you shouldn’t find the general case much harder. An even more general version of the problem features more than two propellers; I don’t know a general solution.

#15. Freeman Dyson, in his article “Birds and Frogs”, published in 2009 in the *Notices of the American Mathematical Society* (volume 56, pages 212–223) wrote: “Schrödinger … started from the idea of unifying mechanics with optics. A hundred years earlier, Hamilton had unified classical mechanics with ray optics, using the same mathematics to describe optical rays and classical particle trajectories. Schrödinger’s idea was to extend this unification to wave optics and wave mechanics. Wave optics already existed, but wave mechanics did not. Schrödinger had to invent wave mechanics to complete the unification. Starting from wave optics as a model, he wrote down a differential equation for a mechanical particle, but the equation made no sense. The equation looked like the equation of conduction of heat in a continuous medium. Heat conduction has no visible relevance to particle mechanics. Schrödinger’s idea seemed to be going nowhere. But then came the surprise. Schrödinger put the square root of minus one into the equation, and suddenly it made sense. Suddenly it became a wave equation instead of a heat conduction equation. And Schrödinger found to his delight that the equation has solutions corresponding to the quantized orbits in the Bohr model of the atom. It turns out that the Schrödinger equation describes correctly everything we know about the behavior of atoms. It is the basis of all of chemistry and most of physics. And that square root of minus one means that nature works with complex numbers and not with real numbers. This discovery came as a complete surprise, to Schrödinger as well as to everybody else.”

#16. Bombelli’s *Algebra* wasn’t just the first text to explain the rules governing complex numbers; it was also the first clear European treatment of the rules governing negative numbers. Of course, Chinese and Indian mathematicians already knew about negative numbers and how to work with them, but they hadn’t tried taking square roots of negative numbers as far as I’m aware. Then again, the Indian mathematician Brahmagupta came up with a formula that in some ways foreshadows the discovery of complex numbers. Remember when I said that when you multiply two complex numbers, their magnitudes get multiplied? Write those two complex numbers as *a*+*bi* and *c*+*di*, so that their product is (*ac*−*bd*)+(*ad*+*bc*)*i*. The magnitudes of these three complex numbers are sqrt(*a*^{2}+*b*^{2}), sqrt(*c*^{2}+*d*^{2}), and sqrt((*ac*−*bd*)^{2}+(*ad*+*bc*)^{2}). So my assertion about how magnitudes multiply becomes the formula sqrt(*a*^{2}+*b*^{2}) sqrt(*c*^{2}+*d*^{2}) = sqrt((*ac*−*bd*)^{2}+(*ad*+*bc*)^{2}), which if you square both sides becomes the simpler but still surprising formula (*a*^{2}+*b*^{2})(*c*^{2}+*d*^{2}) = (*ac*−*bd*)^{2}+(*ad*+*bc*)^{2}, true for all real numbers *a*,*b*,*c*,*d*. This formula tells us that if two positive integers can each be written as a sum of two perfect squares, so can their product. Brahmagupta knew this formula (and others like it), but he couldn’t have known that a thousand years later it would play a role in the study of complex numbers!

You could say that the source of their eventual breakup was present from the start, when they put together a model of the universe at their wedding. It was a sweet but, in hindsight, naive gesture. You see, Math had discovered that there were exactly five ways of sticking identical regular polygons together to form perfectly symmetrical solids (we humans named these regular shapes “Platonic solids” in honor of the philosopher who officiated at the wedding, though he didn’t discover any of them); delighted by the discovery, Math brought the five solids to the ceremony, as gifts for her bride-to-be. Meanwhile, Phyz brought her own gifts: earth, air, fire, water, and “quintessence” (heaven-stuff), the five elements from which she said the universe was constructed. (See Endnote #1.) Five regular solids? Five elements? Surely this marriage was foreordained! (See Endnote #2.) Math and Phyz exchanged gifts and proclaimed their bond, swearing that they would never part. And if any onlookers thought the correspondence between the gifts was forced, they had the good manners to keep their mouths shut.

But after a few millennia, latent tension in the relationship rose to the surface. Physics kept growing and changing, revising her core principles, sheepishly deciding for instance that earth, air, fire, and water weren’t true elements after all. But Math couldn’t help noticing that even as Phyz discovered new elements, Math didn’t have to update her inventory of regular solids. She had in fact found a proof that there couldn’t be any more, and the proof remained valid down the centuries, even as Phyz kept revising her own basic tenets. Oh, and here’s another example: Physics said that projectiles rise in a straight line before falling along a curve, until she said oops, no, they rise along a curve too. Math was embarrassed by the flightiness and unreliability of Phyz, even as Phyz was embarrassed by the stodginess of Math.

Over time Math became more fussy and equivocal. She began to hedge her statements, refusing to say what was true, but merely making conditional assertions of the form “Well, *if* assumptions A, B, and C are true, *then* conclusions X, Y, and Z follow.” Or: “*To the extent that* assumptions A, B, and C are approximately true, *to that same extent* conclusions X, Y, and Z should hold as approximations as well.” Though she hated the way she sounded when she said things like that.

But you shouldn’t think that Math was merely retreating into wishy-washiness or sterile perfectionism. Math was growing just as much as Physics was, but in different ways. And it wasn’t that Math lacked commitment to her relationship with Physics; she just felt too confined by where Phyz lived. Eventually, sometimes around 1900, she said “I need to see different universes,” and she moved out.

**HIGHER DIMENSIONS**

One issue that highlights the divide between Math and Physics is the issue of higher dimensions. Do they exist? Math and Physics have very different answers. In physics, the most naive (and mostly right) answer is “No”: you can’t construct an object with four lines that are at right angles to one another. (Of course, you can change the question and then the answer becomes “Yes”, and then you can change it again and the answer becomes “Maybe”, but I’ll get back to that shortly.) On the other hand, in mathematics, we can lay down axioms for *n*-dimensional Euclidean geometry not just for *n*=2 and *n*=3 but for any positive integer *n*. From these axioms, consequences can be derived, and every mathematician will obtain the same consequences, so higher-dimensional spaces are as real as any other mathematical construct: they’re consistent creations of the human mind with properties that all logical minds will assent to not because the axioms are true (whatever that would mean!) but because the entities under discussion satisfy the axioms by definition. Mathematics nowadays is a language for describing possible universes, of which the universe that we happen to inhabit is just one example.

Turning away from my conceit of Mathematics and Physics as personified beings and turning towards a consideration of human history, consider the careers of the mathematician Ludwig Schläfli (1814-1895) and the physicist Albert Einstein (1879-1955). Schläfli wanted to know what sorts of higher-dimensional regular solids (“regular polytopes” is the more technically correct phrase) exist in *n*-dimensional Euclidean space for values of *n* bigger than 3. He showed that there are *six* regular solids in 4-dimensional Euclidean space but only *three* regular solids in *n*-dimensional Euclidean space when *n* is 5 or 6 or any higher integer. On the other hand, Albert Einstein pursued a view of physics in which our 3-dimensional space needs to be conceived of as part of a 4-dimensional geometry of “spacetime” in which the properties of space and time become interwoven. (See Endnote #3.)

Despite the fact that the two thinkers’ lives overlapped — indeed, Einstein’s precocious ruminations about riding a beam of light ocurred around the time of Schläfli’s death — in an important sense their work did not overlap at all. Partly that’s because the two great relativity theories that Einstein developed aren’t Euclidean; special relativity uses what we now call Minkowski space (with time playing a privileged role that distinguishes it from the other three dimensions; see Endnote #4), and general relativity makes the game even deeper by allowing Minkowski space to warp and bend. But more importantly, Einstein was concerned with *our* world while Schläfli was concerned with idealized *mathematical *worlds.

Nowadays there are speculations that our physical universe might have extra dimensions that are too small to see. The possibility of there being extra dimensions is a tantalizing one (it’s the “Maybe” I mentioned earlier), but in math, extra dimensions are more than a possibility: they becomes an actuality, albeit just one of many coexisting actualities, because math (as we understand it nowadays) isn’t about *actuality*, but about *possibility*.

**GNOSTIC PHYSICS VERSUS GNOSTIC MATH**

To help me further clarify the divide between math and physics, I’ll recruit a couple of hypothetical demiurges (similar to the one postulated by Gnostics, but nerdier) to help me.

So, imagine if all at once all over the world a booming voice were heard, saying: “Hello, hello! Hello everyone. (Is this working? Oh good.) I am a mighty Demiurge, and I have decided to confess: I’ve been messing with you. Most of you are aware that some religious fundamentalists on your planet believe that I or someone like Me planted dinosaur bones in the ground as a test of your faith. Well, I *have* been messing with you. But not using dinosaur bones. Instead, I’ve been interfering with the behavior of electricity and magnetism and light on your planet for several centuries. The bottom line is, Newtonian physics is correct, and special relativity is wrong. There actually *is* a preferred reference frame for observers; the luminiferous ether is real; Maxwell’s equations are wrong; et cetera, et cetera. Surprise!”

Such a pronouncement would give us reason to reconsider Einstein’s theories, but not Schläfli’s. The existence of exactly six regular polytopes in four dimensions is a fact of pure reason, not an experimental observation. *If* we lived in a four-dimensional Euclidean space, *then* there would be exactly six different ways to stick regular three-dimensional polyhedra together to form regular four-dimensional polytopes. The Demiurge’s proclamation wouldn’t change that.

In contrast, we might imagine a different Demiurge who pipes up “That’s nothing! *I* messed with Ludwig Schläfli’s head, and the heads of *everyone* who ever read his work or reconstructed it for themselves, so that nobody would notice the logical fallacy in his proof and discover the hidden seventh regular polytope; every time one of you humans reads the argument for why it can’t exist, I make your brains go BLOOP at the crucial moment and you miss the mistake!” That’s an entirely different kind of mischief. It’s one thing for us to suspect that our observations have misled us; it’s a more disturbing thing to suspect that our processes of reasoning are themselves flawed, and this suspicion quickly leads us to far more radical doubts that undermine not just the Schläfli’s work or Einstein’s but the entire scientific enterprise and our whole sense of self.

(For instance, I don’t *think* that I’m just a brain in a vat. But wait a second: what right have I to say what I think or don’t think, if I can be mistaken about what my own thoughts are? But wait another second: the words “what right have I to say that” only makes sense if I can say things, and if I’m a brain in a vat, I only *think* I’m saying things. Then again, what does “wait a second” even mean if Time itself is an illusion? And …)

To stress the difference between physics and mathematics, I’ll borrow a phrase introduced by the paleontologist Stephen Jay Gould to try to broker an amicable divorce between science and religion. Gould called the disciplines “non-overlapping magisteria” and contended that they weren’t in conflict because science’s questions are “what/when/where?” questions while religion’s questions are “why?” questions, and that there can be no contradiction between the *is* and the *ought*. There are some problems with Gould’s attempt to resolve a key tension of the modern age, not least of which is that fundamentalists of various faiths maintain that their religious scriptures give clear statements of What Is from the Creator of the the Universe (who presumably would be in a position know). But my point is that, in a similar way, math and physics fail to collide because they fail to connect. Math is an engine for deriving non-obvious consequences of assumptions, but it cannot tell us which assumptions to make. We can compare the predictions of mathematics with observations of the real world and use the resulting concordance or discrepancy to decide whether the assumptions that led to those predictions are useful in explaining the world, but when we do this we are doing physics, not math.

**THE THREE FACES OF PI**

Then again, maybe we should think of math and physics as *overlapping* magisteria, and picture things like the famous quantity pi (3.14159…) as living in the overlap. On the one hand, pi is a physical quantity that measures the ratio between the circumferences and diameters of actual, physical circles; on the other hand, it’s a mathematical quantity that is for instance equal to 4 times the limit of the infinite sum 1 – 1/3 + 1/5 – 1/7 + … Let’s call the former *physical pi* and the latter *formulaic pi*. In fact, I want a third pi that I’ll call *geometric pi*. Geometric pi is the ratio between the circumference and the diameter of an ideal mathematical circle, whether or not such circles exist in our world. Mathematical reasoning can lead us, by a beautiful but complicated path, from geometric pi (“circumference divided by diameter”) to formulaic pi (4 times 1 – 1/3 + 1/5 – 1/7 + …, or some other formula for pi you prefer) but it doesn’t tell us whether Euclid’s axioms are a true description of our world. If they’re not, then geometric pi and formulaic pi, although exactly equal to each other, don’t pertain to the world we live in except perhaps as approximations.

Let’s take this idea further. Physical pi involves measuring things, and it can only be known up to finite accuracy. If we can only build circles up to 10^{20} meters across, and we can only measure them to within an accuracy of 10^{–20} meters, then we can only know the diameter or circumference of a circle with 40 significant figures, and when we take the ratio of two such measurements (the circumference and the diameter), we again get only 40 significant figures. In this setting, does it make sense to talk about the hundredth digit of that ratio as having a definite value if there is no way to measure it? Indeed, quantum physics tells us that the whole game of measuring lengths becomes problematic at the subatomic scale. Likewise, general relativity says that once you start building things (like a super-big blackboard on which to draw a super-big circle), the things you build will warp space, causing deviations from Euclid’s axioms (which only apply to flat space, not curved space). So when we talk about computing hundreds of digits of pi, we don’t — *can’t* — mean pi the physical constant; we must mean pi the mathematical quantity, defined by expressions like 4 times (1/1 – 1/3 + 1/5 – 1/7 + …).

A Demiurge might be able to warp space to change our measurements, but it’d have to warp our brains to make us think that 4(1/1 – 1/3 + 1/5 – 1/7 + …) equalled 5, say.

By the way, I don’t want to leave you with the mistaken impression that formulaic pi denotes the value given by the specific formula 4(1/1 – 1/3 + 1/5 – 1/7 + …). There are thousands of known formulas for pi, and it’s the totality of them that constitute what I’m calling formulaic pi, and not any one of them in particular. We know that they’re equal not by measuring objects but by reasoning about mathematical expressions, in the place where Math lives.

**WHERE MATH WENT**

It’s hard to say where Math went when she moved out of our universe. Plato pointed the way to her new home when he wrote “You know that the geometers make use of visible figures and argue about them, but in doing so they are not thinking of these figures but of the things which they represent; thus it is the absolute square and the absolute diameter which is the object of their argument, not the diameter which they draw.” That is, human geometers may draw pictures, but when we draw those pictures we’re thinking about something Absolute, even if it’s in a realm we can’t get to.

In her new home, Math doesn’t have to equivocate and add “… (assuming that Euclid’s axioms are correct)” at the end of every statement of a theorem of Euclidean geometry; she can just make assertions about ideal squares, diameters, circles, etc. that perfectly satisfy Euclid’s axioms, period. Build a Euclidean square whose base is the diagonal of some given Euclidean square, and the new square has area exactly twice the area of the old square. In the place to which Math has gone, there’s no need to worry about black holes warping the picture, or quantum foam undermining the diagram at sub-nanoscale. The constructs that pervade Math’s new home are precisely what they were constructed to be. In some ways it’s a lonely place, but it’s where you need to go if you want to connect with perfect truth, and to know the things you can be absolutely, positively sure of.

Since we humans can’t get to where Math went, we argue about whether the place even exists, and Math is cool with that. And she’s not even lonely, because guess who’s been visiting her there, and sometimes even spending the night?* Physics!* Phyz wants to talk about quantum field theory in *n* dimensions and how it relates to general relativity in *n*+1 dimensions, for all values of *n*!

In the best comedies of remarriage, the two parties to the marriage have done some growing during their period of separation. Perhaps each of them has developed characteristics of the other, becoming better-rounded people in the process. Or perhaps they have become more tolerant of themselves and others. Either way, the new relationship they develop is not the same as the one they had before.

Going back to our celestial Couple, Physics came to accept that Math’s flirtatiousness, her inability to be satisfied with just one universe (or even some large but finite number of them!), wasn’t just a sign of immaturity; her flirtatiousness was a key component of her nature. Phyz realized that even if *she* (Phyz) was content with 4 dimensions, or 11, or 26, Math could never stop there, nor would Phyz really want her to. Higher dimensions and curvature and bizarre topologies and even weirder variations on the theme of what space could be — Phyz now understood that all of this was part of what makes Math wonderful.

But Mathematics learned something too. All along, she’d thought of herself as the imaginative free-spirited one, and physics as the uncreative plodder. But then came string theory, and a particular prediction of string theory called the gauge-gravity correspondence. It was inspired by the physical world, and it might in the end make predictions about the real world, but beyond that possible application, it gave rise to beautiful new theorems. Who could have imagined physics providing inspiration to algebraic geometry? Algebraic geometry was one of the purest precincts of math. Surely if there was to be any traffic between the disciplines, math would inspire physics, and not the other way round! Yet in recent decades, ideas about fundamental particles that may or may not turn out to be good descriptions of the world we live in have provided inspiration to pure mathematicians, providing blueprints for some of the loftiest airborne castles mathematicians are trying to build.

The parade of ideas being imported from physics into mathematics doesn’t undermine my claim about math and physics being separate magisteria, but it sure does complicate it!

Some may rightly point that traffic between math and physics has been bidirectional for a while. They’ll point to Richard Feynman’s non-rigorous path-integrals, or to Oliver Heaviside’s even earlier non-rigorous delta function, which didn’t fit into mathematics when they were first formulated, and whose successes forced an enlargement of mathematics. But string theory is the best example to date. It’s not clear whether, without physics to inspire them, mathematicians would have made the leaps of imagination that led to mathematical string theory — even though the standard stereotype is that mathematicians are the unfettered makers of creative leaps while physicists are constrained by the need to describe the physical world.

Anyway, getting back to our two Personifications, and to my imaginary movie about their divorce and remarriage: In the last scene of the film, Physics and Mathematics return to the place where they first took their vows, and we see Phyz giving a new present to Math: an arXiv preprint discussing new connections between mirror symmetry and the geometric Langlands program. A look of shock comes to Math’s face, replaced by a slowly dawning delight. We the moviegoers don’t know what kind of new relationship the two of them will have going forward, and we’re not sure they know either. But we can tell from the look on Math’s face that what has just been bestowed on her was absolutely, positively the perfect gift.

*Thanks to Sandi Gubin.*

**ENDNOTES**

#1. Here’s what Plato said (in the dialogue *Timaeus*) about the correspondence between four of the five regular solids (the cube, octahedron, tetrahedron, and icosahedron) and the four elements that comprised the physical world according to Greek thought (earth, air, fire, and water):

*“To earth, then, let us assign the cubical form; for earth is the most immoveable of the four and the most plastic of all bodies, and that which has the most stable bases must of necessity be of such a nature. Now, of the triangles which we assumed at first, that which has two equal sides is by nature more firmly based than that which has unequal sides; and of the compound figures which are formed out of either, the plane equilateral quadrangle has necessarily, a more stable basis than the equilateral triangle, both in the whole and in the parts. Wherefore, in assigning this figure to earth, we adhere to probability; and to water we assign that one of the remaining forms which is the least moveable; and the most moveable of them to fire; and to air that which is intermediate. Also we assign the smallest body to fire, and the greatest to water, and the intermediate in size to air; and, again, the acutest body to fire, and the next in acuteness to, air, and the third to water. Of all these elements, that which has the fewest bases must necessarily be the most moveable, for it must be the acutest and most penetrating in every way, and also the lightest as being composed of the smallest number of similar particles: and the second body has similar properties in a second degree, and the third body in the third degree. Let it be agreed, then, both according to strict reason and according to probability, that the pyramid is the solid which is the original element and seed of fire; and let us assign the element which was next in the order of generation to air, and the third to water. We must imagine all these to be so small that no single particle of any of the four kinds is seen by us on account of their smallness: but when many of them are collected together their aggregates are seen.”*

As for the fifth regular solid, the dodecahedron, Plato decided that it must be correspond to some fifth element (or “quintessence”), and that since the number of its sides (twelve) is the number of signs in the Greek zodiac, it must be the element that the heavens are made of.

It should be stressed that Plato advanced this cosmology as a working hypothesis, not as what we would nowadays called “settled science”.

#2: As a side-note to my parable, I can’t resist mentioning that, to the Pythagoreans, the number five symbolized marriage, as it was the sum of the first “male” number (3) and the first “female” number (2). Presumably the Pythagoreans would have thought it more fitting to use the number 4 to symbolize the marriage of two females.

#3: When long skinny object rotates, we may sometimes see it as being tall and thin (when its axis is vertical) and at other times as being low and long (when its axis is horizontal), but we don’t think anything essential about it has changed. This is all the more true if the object is stationary and we, the observers, are the ones doing the rotating. That’s because the three dimensions of space are interwoven. In Einstein’s theory of special relativity, time joins the weave but in a different way. A clock with a circular clock-face moving at close to the speed of light will appear to run slow and its face will not look circular. The same is true if the clock is standing still and we’re the ones who are moving. But the tempo and shape of the clock haven’t changed — just the relationship between it and the observer.

#4: One way to build up the theory of three-dimensional Euclidean geometry is to use coordinates in the manner pioneered by Descartes. Points become triples of numbers, and the distance between the point (*x*_{1},*y*_{1},*z*_{1}) and the point (*x*_{2},*y*_{2},*z*_{2}) is the square root of (*x*_{1}–*x*_{2})^{2}+(*y*_{1}–*y*_{2})^{2}+(*z*_{1}–*z*_{2})^{2}. We could build a 4-dimensional Euclidean space by using quadruples (*w*,*x*,*y*,*z*) instead of triples (*x*,*y*,*z*) and define the distance between the point (*w*_{1},*x*_{1},*y*_{1},*z*_{1}) and the point (*w*_{2},*x*_{2},*y*_{2},*z*_{2}) to be the square root of (*w*_{1}–*w*_{2})^{2}+(*x*_{1}–*x*_{2})^{2}+(*y*_{1}–*y*_{2})^{2}+(*z*_{1}–*z*_{2})^{2}. But for purposes of physics it’s better to use Minkowski space: points are still quadruples, but now our fourth coordinate is to be thought of as signifying time, and the “distance” between (*x*_{1},*y*_{1},*z*_{1},*t*_{1}) and (*x*_{2},*y*_{2},*z*_{2},*t*_{2}) is (*x*_{1}–*x*_{2})^{2}+(*y*_{1}–*y*_{2})^{2}+(*z*_{1}–*z*_{2})^{2}–(*t*_{1}–*t*_{2})^{2}. The minus sign in front of that last (*t*_{1}–*t*_{2})^{2} is crucial. Distances can now be negative numbers, corresponding to events in space-time that occur in a definite order no matter who observes them; meanwhile events at positive distance correspond to events that are causally separated, and events at distance zero correspond to points in spacetime along the path of a photon.

(Some people prefer to use (*t*_{1}–*t*_{2})^{2}–(*x*_{1}–*x*_{2})^{2}–(*y*_{1}–*y*_{2})^{2}–(*z*_{1}–*z*_{2})^{2}. That also works, as long as you don’t get mixed up about which sign-convention you’re using.)