— *The Exorcist *(William Peter Blatty)

Sometimes a key advance is embodied in an insight that in retrospect looks simple and even obvious, and when someone shares it with us our elation is mixed with a kind of bewildered embarrassment, as seen in T. H. Huxley’s reaction to learning about Darwin’s theory of evolution through natural selection: “How extremely stupid not to have thought of that.”

This phenomenon often arises as one learns math. Mathematician Piper H writes: “The weird thing about math is you can be struggling to climb this mountain, then conquer the mountain, and look out from the top only to find you’re only a few feet from where you started.” In the same vein, mathematician David Epstein has said that learning mathematics is like climbing a cliff that’s sheer vertical in front of you and horizontal behind you. And mathematician Jules Hedges writes: “Math is like flattening mountains. It’s really hard work, and after you’ve finished you show someone and they say “What mountain?””

These descriptions apply both to people who are learning math from books and to people working at the frontier of the known, discovering entirely new things. A lot of the work one does isn’t visible to others because sometimes you need to explore a terrain thoroughly before you can find the straight path through it. I’m reminded of the parable of the king who asked the greatest artist of the land to create for him a painting of a bird.^{1} The artist said “Come back to me in a year and I will give you your painting.” When the king returned, the artist said “I am still not ready; give me another year and I will give you your painting.” This happened several times, until finally the king said “Give me the painting now, or I will have your head cut off!” Thereupon the artist whipped out a brush and, in a few quick strokes, created the most beautiful painting of a bird the king had ever seen. Astonished, the king asked “If this was so easy for you, why did you make me wait so long?” By way of answer, the artist led the king to another room containing hundreds of sketches of birds. The artist’s inspired creation only seemed to come from nowhere; it grew out of a huge mass of preparation, hidden from sight.

In math, what often happens is that we try to solve a problem using one approach, and then another, and then another, failing each time, until we finally hit on an approach that works, possibly after months or years. In a different sort of mathematical culture, researchers might be encouraged to discuss those failures and the lessons they learned from them, but in our culture, it is customary to describe only the approach that worked. This custom has the unfortunate effect of making advances seem like strokes of genius rather than fruits of effort. Then other people’s responses tend to be less like “How extremely stupid not to have thought of that” and more like “How on earth could anyone ever think of that?”^{2}

**FEFFERMAN’S DEVIL**

Mathematician Charles Fefferman has an analogy for math that I like a lot (and in fact it’s the reason why I put a chessboard grid on the pseudosphere that serves as this blog’s logo); he says that doing math research is like playing chess with the Devil. Or rather, chess with *a* devil who, although much smarter than you, is bound by an ironclad rule: although *you* are allowed at any stage to take back as many moves as you like and rewind the game to an earlier stage, the devil cannot. In game after game, the devil trounces you, but if you learn from your mistakes, you can turn his intelligence against him, forcing him to become your chess tutor. Eventually you may run out of mistakes to make and find a winning line of play. Someone who reads a record of the final version of the game (the one in which you win) may marvel at some cunning trap you set and ask “How on earth did you know that this would lead to checkmate ten moves in the future?” The answer is, you already had a chance to explore that future.^{3}

In the same fashion, when you try to construct a proof, you often go down blind alleys, but if in the end you reach your goal, you can devise a straight path. In this way, we may see Fefferman’s Devil as unknowingly laboring in the service of Paul Erdős’ God (whom I wrote about last month): a proof that seems to exhibit godlike foresight could be the result of a devilish amount of preparatory fumbling.

Erdős lived for the moments when he caught a glimpse of God’s book and the gnomic proofs it contains, but I would prefer a book that shows the process by which mere mortals find their way to such proofs, surmounting obstacles, dismissing distractions, and maintaining hope along the way. What secrets the erasers and wastebaskets of mathematicians could tell us, if only they could talk!

**THE DEVIL’S BOOK**

Mathematician Doron Zeilberger has noticed the mathematical community’s preference for elegant proofs over proofs that solve problems by brute force, and he’s not happy about it. As a natural contrarian he’s suspicious of consensus and believes (or is willing to pretend to believe) that the future progress of mathematics depends on ugly computer proofs, not the kind of beautiful proofs people like. In his view, the finding of beautiful proofs will become an eccentric pastime of human mathematicians, while their electronic counterparts, untrammeled by our species’ odd notions of beauty, will make the real advances. Over time, our tools may becomes our masters, and we their pets.

If Zeilberger is right, mathematical historians of the future, be they humans or computers, will view the year 1976 as pivotal. That was the year in which mathematicians Kenneth Appel and Wolfgang Haken gave the mathematical community a solution to the century-old Four Color Problem that involved a huge hunk of brute-force computation. The problem itself can be explained to a child in five minutes; the proof found by Appel, Haken, and their computer would take years of toil for a human to verify with pencil and paper. A more recent and extreme example of an unreadable proof is described by Kevin Buzzard in his essay “A computer-generated proof that nobody understands”.

Zeilberger doesn’t just predict a brave new mathematical world; he’s doing his best to bring it into being. Along with the late Herb Wilf, Zeilberger created a mathematical technology for automating the proofs of a broad class of equations (see their book “A=B” written with Marko Petkovsek), and over the course of his career he has delighted in finding brute-force approaches to problems. For instance, consider Morley’s trisector theorem, a beautiful piece of Euclidean geometry that wasn’t discovered until the 19th century. (I’ll tell you more about it next Thirdsday.) There’s a beautiful proof found by John Conway, but Zeilberger wasn’t interested in finding a beautiful proof; he wanted a proof it doesn’t take a Conway to discover. So he found the ugliest proof: a brute-force algebraic verification that gives us complete certainty that Morley’s theorem is true but zero insight into *why* it is true.

Zeilberger wrote that the devil, too, has a book, and he imagined that his proof of Morley’s theorem belonged there. This book would contain all the boring, inelegant proofs missing from God’s book as conceived by Erdős. Actually, Zeilberger called both books notebooks, and this idea of the two books as evolving documents fits in nicely with thoughts about ugly and beautiful mathematics voiced by mathematician G. H. Hardy. Hardy wrote “There is no permanent place in the world for ugly mathematics” while acceding that temporary ugliness is an essential feature of doing mathematics; you can’t build cathedrals without putting up scaffolds.

Haken and Appel’s proof didn’t end the story of the Four Color Theorem; their proof led to a shorter proof, and the quest for even shorter proofs continues. Meanwhile, Erdős’ love of elegance didn’t stop him from being phenomenally productive, and most of the proofs he found fell short of his high standard for “Book proofs”. So maybe we don’t have to choose between God’s book and the devil’s? Maybe we can honor both?

**THE LAST LAUGH**

There’s a sense in which the Devil claims the final victory. Although there is not and may never be an uncontroverted notion of what constitutes mathematical elegance, there are objective ways to measure the simplicity of a proof as a kind of surrogate for elegance, and we might imagine that for every simple problem there is a simple solution that we could discover, at least in principle, by staying at the devil’s chess-table long enough. That is, for every easy-to-state theorem we might hope that there would be a proof that isn’t necessarily easy to find but which, once found, could be verified by humans (possibly with computer assistance). And here comes the real deviltry. Thanks to the tricks discovered in the 20th century by Kurt Gödel, Alan Turing, and Gregory Chaitin, reason can be turned against itself to show that there are bound to be theorems whose proofs are all obscenely long in comparison with the length of the theorem itself. This is related to Turing’s discovery that there is no sieve to unerringly sift the provable from the disprovable, and Gödel’s discovery that in any sufficiently advanced formal system there will be propositions that are *undecidable*: neither provable nor disprovable.^{4}

We don’t know the location of the border between the kind of math we care about and the kind Gödel et al. warned us about. Logician Harvey Friedman thinks the border may be closer than we think, and has spent the past few decades devising ever-more disquieting examples of problems that are haunted by the ghost of undecidability. We may someday find an easily-stated truth with no proof that can be uttered (let alone checked) in a lifetime, and we may never recognize the theorem as true. Our human mathematics may be a game limited to the shallows of reason; the farther out we wade, the greater the chances of being pulled out to sea by the undertow. Computers may enable us to go a little deeper, but there are limits to what beings in our universe, however constituted, can hope to do in the space and time allotted to us. Beyond what we can know, or ever will be able to know, there is a Void with a ragged beginning and no end. Is it laughing at us?

*Thanks to Kevin Buzzard, Sandi Gubin, Piper H, Cris Moore, Ben Orlin and James Tanton.*

**ENDNOTES**

#1. I’m reconstructing this parable from memory, so I may have some details wrong. I couldn’t find this on the web, but surely one of you can find the source!

#2. I recently came across a great quote from mathematician Gian-Carlo Rota: “Philosophers and psychiatrists should explain why it is that we mathematicians are in the habit of systematically erasing our footsteps.” I discuss the phenomenon in my essay “The Genius Box”.

#3. Contrast Fefferman’s harmless devil with the villains in fantasy fiction, who can and will kill us and the people we care about. If fantasy novels were an unbiased sample of imaginary worlds, the vast majority would end mid-book, with the main character falling prey to some otherworldly peril her past experiences hadn’t prepared her for. The books we actually read are governed by a monumental amount of survivorship bias. In my zeroeth Mathematical Enchantments essay I wrote that math is my consolation for living in a world without magic, but really, I’m a big enough coward that if I were offered a passport to magical realms I’d probably turn it down. The worlds of fantasy that I like best have rules; if you run afoul of those rules, you die, and there is no reset button. Given a choice of adversaries, I’ll take Fefferman’s devil anytime.

#4. Actually, there is an escape clause from undecidability (but you won’t like it): given a formal system for proving theorems about counting-numbers-and-things-like-that, there may be an infallible way for us to recognize which theorems are provable in our system and which aren’t, but only if the task becomes trivial (all theorems are provable, none aren’t) and pointless (if all theorems are provable in our system, then provability can’t be telling us much about what’s true). That’s what happens if our formal system is inconsistent. Very few mathematicians are seriously worried that the systems that undergird mathematics (such as Peano Arithmetic or Zermelo-Fraenkel set theory) might harbor contradictions, and most of us have faith that even if these particular systems turn out to be flawed, the flaws can be fixed. But if no fix exists — if there is no way to put our mathematics on firm foundations — then I suspect the devil’s laughter fills the mathematical universe from one nonexistent end of time to the other.

]]>

— Paul Erdős

Creating gods in our own image is a human tendency mathematicians aren’t immune to. The famed 20th century mathematician Paul “Uncle Paul” Erdős, although a nonbeliever, liked to imagine a deity who possessed a special Book containing the best proof of every mathematical truth. If you found a proof whose elegance pleased Erdős, he’d exclaim “That’s one from The Book!”

I’m a fan of Erdős, but today I’ll argue that the belief that every theorem has a best proof is misguided.^{1}

Nowadays there’s a terrestrial shadow of Erdős’ celestial book, called “Proofs From THE BOOK”, and its authors Martin Aigner and Günter Ziegler explicitly disavow the idea that each theorem has a best proof. Their very first chapter contains six different proofs of the existence of infinitely many prime numbers. Ziegler, in an interview in Quanta Magazine, said: “There are theorems that have several genuinely different proofs, and each proof tells you something different.” Most mathematicians (and maybe even Erdős himself) would agree.

**A PROBLEM ABOUT TILINGS**

A great illustration of this “Let a hundred proofs bloom” point of view is provided by an article by Stan Wagon called “Fourteen Proofs of a Result About Tiling a Rectangle”. Here’s the result his title refers to (a puzzle posed and solved by Nicolaas de Bruijn): Whenever a rectangle can be cut up into smaller rectangles each of which has at least one integer side, then the big rectangle has at least one integer side too. (Here “at least one integer side” is tantamount to “at least two integer sides”, since the opposite sides of a rectangle always have the same length.)

A tiling means a dissection with no gaps or overlaps. Here’s a picture of the sort of tiling we’re talking about (taken from Wagon’s article).

We see a big rectangle tiled by small rectangles, where each of the small rectangles is marked with either an *H* (to signify that the horizontal side-length is an integer) or a *V* (to signify that the vertical side-length is an integer). I hope you’ll agree that it’s not obvious at all that the big rectangle must have an integer side! You might want to play around with the general problem for a bit to convince yourself both that it’s true *and* that it’s not obvious how to prove it.

Wagon presents fourteen elegant proofs of this result, contributed by a variety of colleagues; I’ll show you two of them (which can also be found in chapter 29 of Aigner and Ziegler’s book). In both proofs, we call the tiled rectangle *R*, and we let *a* and *b* be the width and height of *R*, respectively.

I’ll demonstrate, in two different ways, that at least one of the two numbers *a*, *b* must be a whole number. You may have an esthetic preference for one proof or the other, but neither of them is dispensable because each of them points in directions that the other doesn’t.

**THE CHECKERBOARD PROOF**

Divide the plane into ½-by-½ squares alternately colored black and white as in a checkerboard and superimpose this checkerboard with the tiled rectangle, so that the lower left corner of the tiled rectangle is a lower-left corner of a black square in the checkerboard. (Why is the checkerboard coloring relevant and helpful? Why do we want ½-by-½ squares instead of 1-by-1 squares? Wait and see!) I’ll draw the black squares as gray squares to make it easier for you to see the checkerboard and the tiling at the same time.

Here’s a proof that at least one of the two numbers *a*, *b* must be a whole number (with some of the details deferred to the Endnotes):

(1) Because each of the rectangles that tile *R* has an integer side, each tile *T* contains an equal amount of black and white; that is, the black part of *T* and the white part of *T* have the same area. (See Endnote #2 for justification.)

(2) Therefore *R* (being composed of tiles) contains an equal amount of black and white.

(3) Now ignore the tiles and focus on the *a*-by-*b* rectangle *R*. Suppose *a* and *b* aren’t integers. Let *m* and *n* be the integer parts of *a* and *b* respectively, and write *a*=*m*+*r* and *b*=*n*+*s*, with *r* and *s* between 0 and 1. Split *R* into an *m*-by-*n* rectangle, an *r*-by-*n* rectangle, an *m*-by-*s* rectangle, and an *r*-by-*s* rectangle, as in the picture below. Each of the first three rectangles has an integer side and therefore has equal amounts of black and white (as in (1)) but the fourth has more black than white. (See Endnote #3 for justification of that last assertion.)

(4) Therefore *R* contains more black than white.

(5) Since (2) and (4) contradict each other, our supposition that *a* and *b* aren’t integers is incompatible with the assumption that each of the tiles has an integer side.

(For a related approach, see Endnote #4.)

**THE TILES-AND-CORNERS PROOF**

Draw the standard Cartesian coordinate frame, with horizontal and vertical axes intersecting at (0,0), and superimpose this picture with the tiled rectangle, so that the lower left corner of the tiled rectangle is (0,0). (Why are coordinates relevant and helpful? Wait and see!)

We’ll put a black dot at the center of each tile and a white dot at each tile corner whose *x*– and *y*-coordinates are both integers (let’s call a point like this an “integer corner”), and we’ll draw a dashed line from a black dot to a white dot if the black dot is at the center of a tile and the white dot is at one of the four corners of that tile.

The heart of the argument is to count the dashed lines in two different ways, with each way of counting providing different information about the total. One way to count the dashed lines is to consider how many dashed lines emanate from each black dot, and add up those numbers (in the picture above, that would give a 4 and four 2’s); the other way is to consider how many dashed lines come into each *white* dot, and add up *those* numbers (in the picture above, that would give five 2’s and two 1’s).

(1) Each tile has 0, 2, or 4 integer corners. (See Endnote #5.) So each black dot has an even number of dashed lines emanating from it.

(2) Therefore the total number of dashed lines, being a sum of even numbers, is even.

(3) Suppose *a* and *b* are non-integer, so that (*a*,0), (0,*b*), and (*a*,*b*) are not integer corners and there are no white dots there. Then the white dot at (0,0) lies on one dashed line (joining it to the black dot in the middle of the lower-leftmost tile) and every other white dot is a corner of either 2 tiles (as shown above) or 4 tiles (as shown below), so each of those other white dots lies on an even number of dashed lines.

(4) The total number of dashed lines is equal to the number of dashed lines passing into (0,0) (which is 1) plus the number of dashed lines passing into the other integer corners (which, being a sum of even numbers, is even). If you add a bunch of numbers, one of which is odd and the rest of which are even, the total is odd, so the number of dashed lines is odd.

(5) Since (2) and (4) contradict each other, our supposition that *a* and *b* are non-integer is incompatible with the assumption that each of the tiles has an integer side.

**TRICKS AND METHODS**

Back in the 1920s, mathematician George Pólya wrote: “An idea that can be used only once is a trick. If one can use it more than once it becomes a method.”

What would Pólya have said about the two proofs of de Bruijn’s theorem that appear above? Is the idea of imposing a checkerboard coloring a trick or a method? What about the two-ways-of-counting idea?

Here are couple of problems you might try to solve using those ideas.

**The Mutilated Checkerboard Problem**: Consider an 8-by-8 square from which two diagonally opposite 1-by-1 squares have been removed, with total area 64 − 2 = 62. Is there a way to tile it with 31 1-by-2 rectangles (which can be either horizontal or vertical)? See Endnote #6.

**The Half-Friendly Party Problem**: Someone tells you “I was at a party with 6 other people, and interestingly, each of us was friends with exactly half of the other 6 people.” (Assume that if person A is friends with person B, then person B is friends with person A; also assume that no person is their own friend.) Clearly the person who told you this is a nerd and a bore, but are they also a liar? See Endnote #7.

I’d modify Pólya’s dictum and say that when a trick works in multiple settings, it’s still a trick, but it goes into a big bag containing all the tricks you’ve ever seen, and a “method” you can use when attacking a new problem is rummaging through the bag and asking yourself “Which of these old tricks might solve this new problem?” Part of one’s mathematical education is filling the bag.

Pólya made a brave start at sharing his own personal bag of tricks and his general approach to problem-solving in the book “How To Solve It”, and more recently there was an effort to crowd-source the collation of mathematical problem-solving tricks at tricki.org. Say someone who’s never seen de Bruijn’s theorem before is trying to prove that there’s no way to divide a rectangle that has no integer side-lengths into smaller rectangles that each have an integer side-length, but they’re stuck. If they go to the Tricki main page and click on “What kind of problem am I trying to solve?” and then click on “Techniques for proving impossibility and nonexistence” and then click on “Invariants” and then click on “Elementary examples of invariants”, they can see both of the proofs I’ve just shown you. Unfortunately the search feature of the Tricki site is very primitive, which limits the usefulness of the site. But it’s a start.

**WHAT IS BEST?**

There’s an episode of “The Office” in which the character Jim Halpert pulls a prank on his deskmate Dwight Schrute by doing a brilliant imitation of Dwight’s picayune pomposity, asking and then answering the nonsensical question “What kind of bear is best?” The question is hilariously idiotic on many levels (Dwight himself declares the question to be “ridiculous” even though it’s an exaggerated version of the sorts of things he himself says). One aspect of the idiocy is that “best” has no clear meaning in this context. If you need a bear that can claw its way through permafrost, you probably want a polar bear; if you need one that can climb a tree, you probably don’t. “What bear is best?” deserves the response “Best *at what*?” Likewise, the question “What proof is best?” deserves the response “Best *for what*?” and “Best *for whom*?”

Let’s tackle “Best for whom?” first. As Ziegler says, “For some theorems, there are different perfect proofs for different types of readers. I mean, what is a proof? A proof, in the end, is something that convinces the reader of things being true. And whether the proof is understandable and beautiful depends not only on the proof but also on the reader: What do you know? What do you like? What do you find obvious?”

The meaning of “Best for what?” requires some elaboration. No problem or theorem is an island, and the theorem we’ve been discussing should be understood not as an isolated puzzle but as part of a family of related problems. For instance, we might go into higher dimensions and consider a tiling of a three-dimensional box *B* by smaller boxes, each of which has at least one of its three side-lengths being integers. Must at least one of the three side-lengths of *B* be an integer? The answer is yes, and both the checkerboard proof and the tiles-and-corners proof can be modified to prove this variant of the tiling-a-rectangle problem.

Here’s another variant: Look at a tiling of a three-dimensional box *B* by smaller boxes, each of which has at least **two** of its three (non-parallel) side-lengths being integers. Must at least two of the three side-lengths of *B* be an integer? The answer is again yes, and the checkerboard proof can be adapted to solve this problem — but the tiles-and-corners proof can’t.

Should we conclude from this that the checkerboard proof is superior to the tiles-and-corners proof? No! Here’s a variant that tiles-and-corners approach handles easily but the checkerboard approach doesn’t: Show that, whenever a rectangle is tiled by rectangles each of which has at least one **rational** side, then the tiled rectangle has at least one rational side.

Neither of the two proofs of the original theorem supplants the other; each provides insights that the other lacks, and leads in directions that the other can’t. Wagon considers fourteen proofs in total, and shows that there is no best proof in the bunch; each has limitations as well as strengths.

Once we start to view math problems not as isolated puzzles but as part of a huge interconnected tapestry − a tapestry that the mathematical community is constantly exploring and extending − then we see that the idea of the One Best Proof fails to do justice to the richness of the tapestry. Which proof is best, you ask? Well, that depends: where do you want to go next in your exploration of the tapestry? If you want to head in *this* direction, solution A might be good; if you want to go *thataway*, solution B might be better.

One best proof? Sorry, Uncle Paul; I say “False”. You don’t have to disbelieve in The Book, but you should disbelieve *that*.

*Thanks to Sandi Gubin, David Jacobi, Joel Spencer, Stan Wagon, and Günter Ziegler.*

Next month (Feb. 17): Chess with the Devil.

**ENDNOTES**

#1: Maybe I’m being unfair to Erdős here, but as far as I know he never modified his original view that each theorem has just one Book proof. Then again, Ziegler (in private communication) remarks: “As I remember Uncle Paul, I think it was not important for him to be right, also in this point, but it was important to him to have a good story to tell, and he did.”

#2: We’re going to prove this claim by literally tearing it to pieces, or rather by tearing the rectangle to pieces. For definiteness we’ll focus on the case where the height of the rectangle is a whole number (since the argument for the case where the width is a whole number is essentially the same).

If there’s an *x*-by-*n* rectangle in the plane, where *x* is a real number and *n* is a whole number, you can tear it into *n* *x*-by-1 strips, so IF we can show that each of those *x*-by-1 strips contains equal amounts of black and white, THEN we’ll know that the whole *x*-by-*n* strip does too. (Here’s a picture of what that slicing-up looks like when *n *is 3.)

Are we done cutting the problem down to size? By no means! If an *x*-by-1 strip happens to intersect some of the vertical lines of the checkerboard, you can cut that strip into pieces, using the vertical lines of the checkerboard as cut-lines. As before, IF we can show that each of those new sliced-and-diced pieces has as much black as white, THEN we’ll be able to conclude that the whole strip does.

So now we’ve reduced the claim to tiny rectangles that fit between two consecutive vertical lines in the ½-by-½ checkerboard, and that have height 1 (here I’ve blown up the picture for intelligibility):

Since the tiny rectangle has height 1 and the black part in its middle has height ½, it’s clear that the tiny rectangle is half black and hence half white as well. (If you look at this last part of the argument, you’ll see where we need the fact that all the checkerboard squares are ½-by-½.) Rewinding the argument, we’ve shown that 1-by-*x* slices are half black and half white, and that shows that the *n*-by-*x* rectangle is half black and half white.

#3: We want to show that if *r* and *s* are between 0 and 1, then an *r*-by-*s* rectangle with a black checkerboard square nestled in its lower left corner has more black than white. The most interesting case is when *r* and *s* are both bigger than ½, as in the picture. Write *r* = ½ + *t* and *s* = ½ + *u*, with *t* and *u* between 0 and ½.

The *r*-by-*s* rectangle consists of a black ½-by-½ square, a white *t*-by-½ rectangle, a white ½-by-*u* rectangle, and a black *t*-by-*u* rectangle. So the black area minus the white area equals (½)(½) − (*t*)(½) − (½)(*u*) + (*t*)(*u*) = (½ − *t*)(½ − *u*), which (being a product of two positive numbers) must be positive.

#4: A variant of the checkerboard proof uses the fact that the black area of *R* minus the white area of *R* can be written as the product of two numbers, *L* and *M*, where *L* is the black length of the bottom edge of *R* minus the white length of the bottom edge of *R* and *M* is the black length of the left edge of *R* minus the white length of the left edge of *R*. (By the “black length of the bottom edge of *R*” I mean the sum of the side-lengths of the black squares that adjoin the bottom edge of *R*. Similarly for the other black and white lengths.) You might check that the punchline of Endnote #3 uses this trick.

#5: Write the corners as (*x*,*y*), (*x*‘,*y*), (*x*,*y*‘), and (*x*‘,*y*‘). Suppose the tile has integer width. Then *x*‘ and *x* differ by an integer, so either (*x*,*y*) and (*x*‘,*y*) are both integer corner points or neither one is, and ditto for (*x*,*y*‘) and (*x*‘,*y*‘). So the number of integer corners of the tile is 0, 2, or 4. The same conclusion holds in the case where the tile has integer height.

#6: The word “checkerboard” in the name of this classic problem gives away the trick (excuse me, method). The two removed squares have the same color (black, say), so the resulting mutilated board has more white than black squares. On the other hand, if we could tile the mutilated board with 1-by-2 rectangles, then the board would have to contain equal amounts of white and black, since each individual tile does. This contradiction shows that no such tiling is possible.

#7: How many friend-pairs were at the party? For each person *x*, we can count the number of people at the party who are friends with *x*, and then add up all those numbers; that should give us the number of friend-pairs times two, since each friend-pair gets counted twice by this method. (If *x* and *y* are friends, then *y* counts as a friend of *x* and *x* counts as a friend of *y*.) So we must get an even number.

But: each of the 7 people is friends with 3 people, and 3+3+3+3+3+3+3 is odd. Contradiction!

#8: I hope to present a version of this essay at the 14th annual Gathering for Gardner in March 2020.

**REFERENCES**

Martin Aigner and Günter M. Ziegler, “Proofs from THE BOOK”, Springer (Sixth Edition 2018).

Erica Klarreich, “In Search of God’s Perfect Proofs”, Quanta Magazine.

George Polya, “How To Solve It” (1945). For a quick summary of some of the main ideas, look at the handout George Melvin created for a course he taught at Berkeley.

Stan Wagon, “Fourteen Proofs of a Result About Tiling a Rectangle”.

]]>We got the points, but I liked the other team’s answer better. The idea of an empty salad might seem like a purely mathematical fancy, but half a dozen years later I saw a restaurant menu that offered the null salad, or rather “Nowt, served with a subtle hint of sod all” (for the unbeatable price of 0 pounds and 0 pence).^{2}

Today I’ll tell you how to make null salad — not just the tossed kind, but also the kind that’s artfully composed of stacked leafy greens, except that there aren’t any greens in the stack. The trick to preparing it is knowing when to stop, namely, before you start, and the hardest part of all isn’t doing it, but correctly counting how many ways there *are* to do it. You might think the correct count is 0, but it’s not. Coming to understand why the answer isn’t 0 is tricky; it hinges on understanding the difference between a task that’s impossible to do and a task that’s impossible to start because it’s already finished.^{3} Or, putting it differently, it’s about the difference between doing the impossible and doing nothing. There are exactly 0 ways to do the impossible, but in many mathematical settings, there’s exactly 1 way to do nothing.

**COUNTING TOSSED SALADS**

Of course the original problem is a bit silly, since it assumes that a salad that’s 90% arugula and 10% basil is the same as a salad that’s 10% arugula and 90% basil (but if we didn’t make that assumption, there’d be too many different salads to count and no clear rules for counting them). The problem also ignores the fact that some combinations of greens might not be palatable (but if we didn’t make that assumption, once again there’d be no clear rules for counting the possibilities).

The inclusion of the word “tossed” might seem incidental, but it actually serves an important mathematical role; it tells us that the ingredients are mixed together higgledy-piggledy, so that their arrangement within the bowl isn’t what we’re interested in. All that we’re supposed to care about is which ingredients get used (no matter how little) and which ingredients get left out.

Let’s stick to just arugula and basil for a bit. We have a choice about whether to include arugula or not, and we have a choice about whether to include basil or not. If we disallow the empty salad, then these choices are linked, because deciding to omit arugula would force us to include basil (and deciding to omit basil would force us to include arugula). But if we allow the empty salad, then the choices can be made independently of each other. For each of the two possible ways of deciding about arugula (arugula or no arugula?), we get two possible ways of deciding about basil (basil or no basil?). So the situation becomes symmetrical with regard to inclusion versus exclusion: there are exactly as many salads that include arugula as there are salads that exclude arugula, and ditto for basil. And that’s a good thing, because by and large symmetrical definitions are easier to work with.

Once we’ve decided that it’s expedient to include the empty salad, we find that there are 4 different salads that can be made with arugula and basil as allowed ingredients: arugula-and-basil, arugula-only, basil-only, and the empty salad.

What happens when we include celery as an allowed ingredient? Each of the 4 salads that can be made with arugula and basil as allowed ingredients gives rise to 2 different salads according to whether we use celery or not; so there are twice 4, or 8, salads that can be made with arugula, basil, and celery as allowed ingredients.

I think you see where this is going: when we add a fourth allowed ingredient (dandelion), the number of possibilities becomes twice 8, or 16, and when we add a fifth allowed ingredient (endive), the number of possibilities becomes twice 16, or 32. Thinking about this pattern, we see that the number of different salads we can make with *n* allowed ingredients is 2 times 2 times 2 times … times 2, where the number of 2’s is *n*. We write this number as 2^{n} for short.

In contrast, if we’d chosen to disallow the empty salad, we would have gotten the messier answer 2^{n}−1.

**BACK TO ZERO**

Let’s pause and look at the expression 2^{n} for a bit. It’s defined as the product 2 × 2 × 2 × … × 2, where the number of 2’s is *n* and the number of multiplication signs is *n−*1. This makes sense when *n* is bigger than 1, and it even makes sense, sort of, when *n* is 1: the number of 2’s is 1 and the number of multiplication signs is 0, so our “product” looks like just “2”, whose value is clearly 2 (even though “2” by itself isn’t a product in the ordinary sense).

But what if *n* is 0? Once upon a time, when nobody had yet defined 2^{0}, mathematicians had to make up their collective mind about what they wanted it to mean. They could have left it undefined, because it makes no sense to speak about a product of 2’s in which the number of 2’s is 0 and the number of multiplication signs is *−*1. But it became clear pretty quickly that there are contexts in which it’s useful to define 2^{0} to be 1 (and few or no contexts in which it’s useful to define 2^{0} to be some other value). One good feature of filling in the blank in “__, 2, 4, 8, 16, 32, …” with a 1 is that the resulting sequence 1, 2, 4, 8, 16, 32, … follows a uniform rule: each term is twice the term before (or, going backwards, each term is half of the term that follows).

For the practical-minded, here’s a financial application. Consider a bank that gives interest at the (unrealistic but simple-to-discuss) rate of 100% compounded annually. After *n* years, where *n* is any positive integer you like, an initial deposit of $100 will become $100 times 2^{n}. What’s the situation after 0 years? If “after 0 years” means “right after you’ve deposited your money”, then you have $100 in your account and not a penny more or a penny less, so it makes sense that 2^{0} should be taken to equal 1.

The permissive notion of salads that goes hand-in-hand with the conventional definition 2^{0} = 1 gives us a new way to see the null salad shining on its own, and not suffering from invidious comparisons with its more filling fellow-salads: if there are 0 ingredients to choose from, then we can make exactly 1 tossed salad from them, namely the null salad.

**COUNTING LAYERED SALADS**

What if we have salads in which structure matters? Let’s throw culinary realism to the winds here and consider salads that contain a single arugula leaf, a single basil leaf, a single celery leaf, a single dandelion leaf, and a single endive leaf. (Never mind that a true composed salad should have a base, a body, and a garnish.) How many such salads are there? More generally, if there are *n* different kinds of leaf, how many ways are there to build a salad by stacking together one leaf of each kind? (Yes, I know, if you stack actual leaves they won’t stay stacked. But this is math, not cuisine.)

When *n* is 1, there’s only 1 way to go.

When *n* is 2, there are 2 choices of which leaf to put on the bottom, and then we’re forced to put the other leaf on the top, so there are 2 possible salads.

When *n* is 3, there are 3 choices of which leaf to put on the bottom, and then 2 remaining choices for what to put on top of that, and then only 1 remaining option for what to put on the top, so the total number of possibilities is 3 × 2 × 1 = 6.

Again, I think you see where this going: when *n* is 4, the total number of possibilities is 4 × 3 × 2 × 1 = 24, and when *n* is 5, the total number of possibilities is 5 × 4 × 3 × 2 × 1 = 120.

We have the symbol “*n*!” (pronounced “*n* factorial”) for the product of the counting numbers from 1 through *n*. So now we are ready to ask the question, how should 0! be defined?

Looking at the sequence __, 1, 2, 6, 24, 120, … in reverse, we see that starting from 120, we first divide by 5 (obtaining 24), then divide by 4 (obtaining 6), then divide by 3 (obtaining 2), then divide by 2 (obtaining 1). So if we want the pattern of divisions to continue, we should next divide by 1 (obtaining 1). By this reasoning, the right value for 0! is 1.

If you prefer a more practical reason for the convention 0! = 1, consider tossing a fair coin *n* times in some gambling game. What’s the probability of getting heads *a* times and tails *b* times? The binomial formula

gives the right answer when *a* and *b* are both positive integers (let me know in the Comments if you know a well-written online reference for this formula that explains why it’s true!). It’s not important to understand where the formula comes from; what I want you to notice is that if you want it to give the right answer when *a*=0 or *b*=0, you’d better take 0! = 1. (If you take 0! = 0 then the expression has a 0 in the denominator and hence makes no sense; if you take 0! to be any nonzero number other than 1, the expression makes sense but gives the wrong answer; if you take 0! to be 1, you get the right answer.)

For those who prefer computer science to probability, the number of ways to form a bit-string consisting of *a* 0’s and *b* 1’s is

(Example: When *a* = *b* = 2, the expression evaluates to 24 / 4 = 6, and the 6 bit-strings consisting of two 0’s and two 1’s are the binary words **0011**, **0101**, **0110**, **1001**, **1010**, and **1100**. If you know a good online proof, I’d love to include a link to it.) In the case where *a* is positive and *b* is zero, you can check that the number of such bit-strings is 1, and that the above expression takes the value 1 only if 0! is defined to be 1.

It’s fun to contemplate the case where *a* and *b* are both 0. For people who study probability, this corresponds to a gambling game in which, just before you make your first toss, you remember what your parents told you about gambling, suddenly “need to use the bathroom”, and sneak out the back. If you do this, you’re 100% certain to toss 0 heads and 0 tails, so the probability of that event is 1. For computer scientists, *a*=*b*=0 corresponds to the bit-string consisting of no bits at all, often written as λ instead of as .^{4} The bit-string λ has length 0, but it’s a mathematical entity, and there’s 1 of it.

So now we can ask: are there 0! layered salads with 0 ingredients? One way to define a layered salad (in the somewhat strange way I’m using the term) is a collection of edible leaves such that (a) each type of leaf that occurs must occur only once, and (b) any two leaves that occur must occur in some definite order. In a persnickety mathematical sense, the empty salad satisfies both conditions vacuously, because (a) you can’t show me a leaf that occurs more than once, and (b) you can’t show me two leaves that *don’t* occur in some definite order. Therefore, there’s 1 (i.e., 0!) layered salad with 0 ingredients.

So if your fridge is empty, and you’re tired of tossed salad with 0 ingredients, you can vary your diet by having a *layered* salad with 0 ingredients instead.

**THE TILING THAT HAS NO TILES AND THE PATH THAT HAS NO STEPS**

Here’s another application of the one-way-to-do-nothing principle. How many ways are there to tile a 2-by-0 rectangle with identical 2-by-1 tiles? We’ll overcome our initial *stupor vacui*^{5} by broadening the question, replacing 0 by an unknown positive integer *n*; then, once we’ve seen what the governing pattern is, we’ll sneak up on 0 by counting backward from the positive integers, and then we’ll notice that the paradoxical answer we obtain actually makes a persnickety kind of sense.

The number of ways to tile a 2-by-1 rectangle with 2-by-1 tiles is 1:

The number of ways to tile a 2-by-2 rectangle with 2-by-1 tiles is 2:

The number of ways to tile a 2-by-3 rectangle with 2-by-1 tiles is 3:

The number of ways to tile a 2-by-4 rectangle with 2-by-1 tiles is 5:

Mathematician/dad/videographer Mike Lawler helped his child study this problem. For general values of *n* the answer is given by the Fibonacci sequence 1,2,3,5,8,13,…

Once you’ve understood why each term is equal to the sum of the two preceding terms (see Mike’s follow-up post on this topic, featuring both kids this time), you can turn the pattern around and say that each term is equal to the *difference* between the two *succeeding* terms. More precisely, the *n*th term in the sequence is equal to the *n*+2nd term minus the *n*+1st term. If we apply that formula in the case *n*=0, we see that the number of tilings of the 2-by-0 rectangle “should” be the number of tilings of the 2-by-2 rectangle minus the number of tilings of the 2-by-1 rectangle, which gives us 2 minus 1, or 1.

That is, the pattern suggests that there’s exactly 1 way to tile a 2-by-0 rectangle with 2-by-1 tiles, and if you think about it, that’s exactly right. The way to tile a 2-by-0 rectangle with 2-by-1 tiles is to say “There’s room for exactly 0 tiles, so I’m done” and then fold your arms. Or as Mike’s son puts it: the way to do it is *not to do anything*. And there’s exactly one way to do that.

As important in modern combinatorics as the Fibonacci numbers are the Catalan numbers 1,2,5,14,42,… (discussed in a video listed in the References). One thing Catalan numbers count is paths in a grid that stay in a triangle. For instance, 42 (the 5th Catalan number) counts the paths in the picture below that join the point *A* = (0,0) to the point *B* = (5,5) consisting of 10 steps that stay within the triangle bounded by dashed lines.

The *n*th Catalan number is equal to the number of paths from (0,0) to (*n*,*n*) consisting of 2*n* steps that stay within the triangle bounded by (0,0), (*n*,0), and (*n*,*n*). A formula for the *n*th Catalan number is

Plugging in *n*=0, we get the answer 1. This counts the number of paths from (0,0) to (0,0) consisting of 0 steps that stay within the triangle bounded by (0,0), (0,0), and (0,0). The three vertices of this triangle are the exact same point, so this region gives you no room to move; but since we’re looking at paths of length 0 within that region, there’s no need to move. Your journey is over as soon as it’s started.

**MATH MINDFULLNESS**

I’ve come about as close to talking about the null set as one can without actually talking about it. I’ll write about the null set some other time; in the meantime, you can read Evelyn Lamb’s fun essay on the topic, which will give you some insight into the body of mathematical work Ben Orlin is riffing on in his cartoon back at the beginning of this piece where he describes the null salad as “the basis of all salads” in the set-theoretic sense.

The null salad (tossed or layered), the empty binary word λ, the tiling with no tiles and the path with no steps inhabit the no-man’s-land between Nothingness and Somethingness, and in discussing them I’m trespassing onto territory claimed by mystics. So I’ll close by proposing, with tongue firmly in cheek, a mathematico-mystical morning meditation practice that will help novices come to deeply understand the mathematics of Doing By Not-Doing.

Your ritual is to prepare and consume null salad. Choose your ingredients, which you need not have on hand; lack of ingredients does not matter, since the recipe requires none of them. Perhaps your choice today is a simple null fruit salad, consisting of 0 apples. After preparing your salad, perform arithmetic operations on your salad that in no way mar its perfect null-ness. For instance, add 0 apples to your salad. Then subtract 0 apples. Then double your portion. Then halve it. It may seem impossible to cut your apple salad in half because there are no apples to cut nor any knife in hand to cut them with, but that does not make the task impossible; indeed, as soon as the task comes into your mind, the task is already completed.

As you carry out these operations, including the final step of consuming the salad in zero mouthfuls, repeat in your mind the mantra λ, The Empty Word, whose meaning is: “There is nothing to be done, and I have already done it.”

Feel enlightened?

Good. Now go eat something.

Next month: What Proof Is Best?

*Thanks to Sandi Gubin, David Jacobi, Joe Malkevitch, Henri Picciotto, and Evan Romer.*

**ENDNOTES**

#1. I would’ve written “salad days”, but that seemed too cheap a pun. Oops, I just wrote it anyway.

#2. The restaurant was Sweeney Todd’s Pizza in Cambridge, England, no longer in business. (Maybe some health inspectors heard disturbing rumors about what sort of meat went on their pizzas?) Incidentally, Silouan Winter has pointed out to me on Twitter that in southern Germany you can find restaurants that offer an empty plate for kids who partake of their parents’ food, called a “Räuberteller” (“robber’s plate”).

#3. I’m reminded of Salvador Dali’s speech in which he stood up, said “I shall be so brief I have already finished,” and then sat down. Or did he? Anyone who can find a source for this quote should post it in the Comments!

#4. See the StackExchange discussion of the origin of the symbol λ for the empty string.

#5. I’m coining the term “*stupor vacui*” in analogy with the existing phrase “*horror vacui*“. It’s intended to refer to the way the mind boggles when it confronts the vacuous, and in desperation clings to answers like “Zero!” or “Nonsense!” or “Impossible!” and rejects an incongruously non-vacuous answer like “Exactly one”.

**REFERENCES**

Alissa Crans, “A Surreptitious Sequence: The Catalan Numbers” (video), produced by the Mathematical Association of America.

Martin Gardner, Nothing; chapter 1 in “Mathematical Magic Show”.

Martin Gardner, More Ado About Nothing; chapter 2 in “Mathematical Magic Show”.

Martin Gardner, Fibonacci and Lucas Numbers; chapter 13 in “Mathematical Circus”.

Martin Gardner, Catalan Numbers; chapter 20 in “Time Travel and Other Mathematical Bewilderments”.

Evelyn Lamb, A Few of My Favorite Spaces: The Empty Set.

]]>It’s not hard to see that this idea has serious limitations. For instance, even though many legal issues surrounding abortion hinge on different definitions of the word “life”, when it comes to the moral side of the debate, definitions don’t change anyone’s mind. Usually we each choose the definition that matches an outcome we’ve decided on, not the other way around. But in mathematics (thank goodness for the consolations of math!), things are different.

Definitions have been on my mind lately for two reasons: I’m teaching lots of definitions to the students in my discrete mathematics course, and I’ve been reading about the work of Kevin Buzzard and his collaborators, who have been teaching lots of definitions to a computer program for doing mathematics.

Definitions are nothing new in mathematics — Euclid’s *Elements* starts with a few, such as “a point is that which hath no part”. But surprisingly, Euclid’s proofs don’t make much use of the initial definitions; it’s the axioms (and the later definitions) that do the heavy lifting.^{1} One modern point of view about this paradoxical situation is that even though Euclid’s first definitions gives readers a way to *think* about points, lines, planes, etc., it’s the axioms that implicitly tell us what these mathematical objects *are*. That is, at an abstract level, Geometry Is As Geometry Does, and a Euclidean “point”, rather than being that-which-hath-no-part, is any object of thought whose properties in relation to other points (and lines and planes) obey Euclid’s axioms. Under this perspective, many mathematical definitions can look a bit circular, though “relational” would be a more apt term.

In a mathematical treatise like Euclid’s, you don’t get all the definitions at the beginning; they’re peppered throughout, with later definitions depending on earlier ones. You could try to read all the definitions at the start, but aside from the fact that you’d overload your brain, a lot of the definitions, read in isolation, would seem arbitrary. You’d be missing the way that those definitions give rise to interesting theorems that retroactively justify those definitions. Amid all the definitions one *could* make, some are more natural, interesting, or useful than others, and it’s not clear from the start what those definitions will be. For instance, until you have some experience multiplying and factoring numbers, it may not clear why the concept of prime numbers matters; and I think it’s only when you’ve seen the unique factorization theorem that the true significance of the prime numbers comes into view.

I like what the late mathematician and educator Charles Wells wrote about definitions:

Some students don’t realize that a definition gives a magic formula — all you have to do is say it out loud. More generally, the definition of a kind of math object, and also each theorem about it, gives you one or more methods to deal with the type of object.

For example,

nis a prime by definition ifn>1 and the only positive integers that dividenare 1 andn. Now if you know thatpis a prime bigger than 10 then you can say thatpis not divisible by 3 because the definition of prime says so. (In Hogwarts you have to say it in Latin, but that is no longer true in math!) Likewise, ifn>10 and 3 dividesnthen you can say thatnis not a prime by definition of prime.You now have a magic spell — just say it and it makes something true!

What the operability of definitions and theorems means is: A definition or theorem is not just a static statement, it is a weapon for deducing truth.

The role of definitions changes as one advances in one’s mathematical education. Some of the change is quantitative. New definitions build on earlier definitions, which build on even earlier definitions, and so on; the more math you learn, the taller your personal definition-tower becomes. The same is true on a communal level: understanding a recent definition like Peter Scholze’s notion of “perfectoid spaces” requires understanding dozens if not hundreds of concepts that the definition builds upon, to which thousands of mathematicians have contributed.

But a less obvious, qualitative difference is that *sometimes definitions don’t even make sense without certain theorems*. That is, a definition may make a tacit claim, and proving the claim may take hard work. Many definitions in advanced mathematics are like this.

Sometimes the tacit claim is *existence* or *uniqueness*. As a non-mathematical example, notice that the phrase “the Prime Minister of the United States” makes no sense; neither does the phrase “the baseball team of the United States”, but for a different reason. The first phrase doesn’t denote an actual person, because the U.S. has no Prime Minister; the second phrase doesn’t denote an actual team, because the U.S. has many baseball teams. In cases like these, the use of the word “the” followed by a singular noun or noun-phrase requires that its referent *exist* and that its referent be *unique*.^{2}

Here’s a mathematical example: when we say “The infinite decimal .333… is defined as the unique number that lies in the intervals [.3,.4], [.33,.34], [.333,.334], etc.”, we’re asserting simultaneously that there is *at least* one such number and that there is *at most* one such number. Many mathematical definitions share this property of making tacit claims; proving these claims requires a side-bar. Those proofs may in turn depend on other theorems, and other definitions, which in turn depend on other theorems. So if you could look inside someone’s brain and somehow see a definition as literally sitting atop earlier definitions, the tower wouldn’t consist merely of definitions; there’d be theorems mixed in there too.

Another wrinkle in advanced mathematics is that many concepts are mergers of several different-looking concepts that turn out to be equivalent for non-obvious reasons. One might see a passage of an advanced math textbook that looks something like this:

**Theorem**: Let *F* be a foo [where a “foo” is some previously-defined kind of mathematical object]. Then the following conditions are equivalent:

(1) …

(2) …

(3) …

**Proof**:

(1) implies (2): …

(2) implies (3): …

(3) implies (1): …

**Definition**: Any foo satisfying conditions (1), (2), and (3) is called a *fnord*.

Which of the three conditions is the “true” definition of fnordness? All of them! Which of the three conditions one should focus on will depend on context.^{3} This situation crops up so often in mathematics that the initialism “TFAE” (for “The Following Are Equivalent”) has become a standard part of a mathematician’s education in the English-speaking world. (If any of you know corresponding initialisms in French or other languages, please post them in the Comments!)

Someone at the frontier of mathematical research may wind up in a situation where there are multiple conditions that are not *quite* equivalent, and must choose which one to canonize as the “right” definition. This requires a certain amount of prescience about the direction of future developments, and since mathematical history (like any other kind) has a way of surprising the people who live through it, sometimes mathematicians get it wrong. For instance, once upon a time it seemed natural to define the word *prime* so as to include the number 1, but nowadays mathematicians agree that, given the directions that number theory has gone in, it’s best to call 1 neither prime nor composite, but to call it a *unit*.^{4}

There are interesting issues about how to tweak an existing definition to handle borderline cases (see the discussion of 0^0 in my May 2019 essay), but a higher order of creativity comes from devising (good) new definitions. Ideally a new definition should enable the creator of the definition to solve existing problems while introducing new directions for future research.

Here’s Kevin Buzzard writing about the research interests of the people he works alongside at Imperial College in London, and contrasting the definition of perfectoid spaces (and other hot topics) with the less fashionable notion of Bruck loops, about which I’ll say nothing except to mention that Buzzard defines them in the space of one long paragraph, thereby demonstrating that one *can* define them succinctly, under a suitable definition of succinctness:

I work in a mathematics department full of people thinking about mirror symmetry, perfectoid spaces, canonical rings of algebraic varieties in characteristic

p, etale cohomology of Shimura varieties defined over number fields, and the Langlands philosophy, amongst other things. Nobody in my department cares about Bruck loops. People care about objects which it takes an entire course to define, not a paragraph.So what are the mathematicians I know interested in? Well, let’s take the research staff in my department at Imperial College. They are working on results about about objects which in some cases take hundreds of axioms to define, or are even more complicated: sometimes even the definitions of the objects we study can only be formalised once one has proved hard theorems. For example the definition of the canonical model of a Shimura variety over a number field can only be made once one has proved most of the theorems in Deligne’s paper on canonical models, which in turn rely on the theory of CM abelian varieties, which in turn rely on the theorems of global class field theory. That’s the kind of definitions which mathematicians in my department get excited about — not Bruck Loops.

When a new definition like “perfectoid spaces” garners professional acclaim for the person who came up with it, and word gets out, it’s natural for the scientifically-interested public to want someone to tell them what the fuss is about. And this is where things get tricky. For the expert, the new definition is at the top of a personal Jenga-tower in their brain. There simply isn’t time to build a copy of that tower, or even a streamlined version of it, in the reader’s brain. There needs to be something simpler, and a certain amount of distortion is inevitable.^{5}

Some writers resort to metaphor. Others connect the new concept with concepts slightly lower in the Jenga-tower, treating them all as black boxes and explaining how they relate to one another, saying things like “[Concept X] unifies [Concept Y] with [Concept Z]” without ever explaining the details of Concepts X, Y, and Z. Still others despair of explaining the math and resort to biography (e.g., “The crucial insight finally came to her while she was scuba-diving during her honeymoon in Australia”).^{6}

To see what happened when Michael Harris accepted the challenge of trying to explain Scholze’s perfectoid spaces to a general scientific readership, read his essay “Is the tone appropriate? Is the mathematics at the right level?“, and for comparison read Gilead Amit’s essay The Shape of Numbers that *New Scientist* decided to publish instead of what Harris wrote. And then you’ll understand why I’ve decided to be a math essayist rather than a math journalist (much as I admire mathematicians who step into the fray).

I can’t say I understand Harris’ essay more than superficially. I’m intrigued by the idea that Scholze’s theory of diamonds allows you to “clone” a prime, but what does that really mean? Maybe if I’d studied Spec(**Z**) back in grad school (or if I took the time to learn about it now) I’d have a clue. And while we’re talking about Spec(**Z**) (or rather talking about not talking about it), I’ve always wondered what diophantine algebraic geometers mean when they say primes are like knots; I hope I’ll understand this someday!

Frank Quinn wrote an essay that has a very nice passage about the role of definitions in modern mathematics:

Definitions that are modern in this sense were developed in the late 1800s. It took awhile to learn to use them: to see how to pack wisdom and experience into a list of axioms, how to fine-tune them to optimize their properties, and how to see opportunities where a new definition might organize a body of material. Well-optimized modern definitions have unexpected advantages. They give access to material that is not (as far as we know) reflected in the physical world. A really “good” definition often has logical consequences that are unanticipated or counterintuitive. A great deal of modern mathematics is built on these unexpected bonuses, but they would have been rejected in the old, more scientific approach. Finally, modern definitions are more accessible to new users. Intuitions can be developed by working directly with definitions, and this is faster and more reliable than trying to contrive a link to physical experience.

I’ll end by quoting Peter Scholze himself:

What I care most about are definitions. For one thing, humans describe mathematics through language, and, as always, we need sharp words in order to articulate our ideas clearly. For example, for a long time, I had some idea of the concept of diamonds. But only when I came up with a good name could I really start to think about it, let alone communicate it to others. Finding the name took several months (or even a year?). Then it took another two or three years to finally write down the correct definition (among many close variants). The essential difficulty in writing “Etale cohomology of diamonds” was (by far) not giving the proofs, but finding the definitions. But even beyond mere language, we perceive mathematical nature through the lenses given by definitions, and it is critical that the definitions put the essential points into focus.

*Thanks to Sandi Gubin for help with this piece.*

Next month: The Null Salad.

**ENDNOTES**

#1. Sometimes Euclid’s definitions and axioms also hinge on unstated assumptions whose tacit role only came into view many centuries later, but that’s another story.

#2. The distinction between “a” and “the” came to my attention many years ago when I was touring the parts of the Mormon Tabernacle that are open to the public, and the tour guide said “The people who settled Utah were hard-working people; that’s why we call Utah a beehive state.” I asked her whether she meant “the”, since after all Utah is often called *The* Beehive State, but she said she meant “a”. I think she chose the indefinite article to focus her listeners on what kind of people Utahans were and are, rather than inviting comparisons between Utahans and non-Utahans.

#3. Following up on the definition of fnords as special kinds of foos, there might be other theorems, such as “If *F*_{1} and *F*_{2} are fnords, then so is *F*_{1}+*F*_{2}” (assuming that addition of foos has already been defined). A happy asymmetry comes to the aid of someone trying to prove such a theorem: since *F*_{1} and *F*_{2} are (by hypothesis) fnords, each of them satisfies *all three* of the magic fnord-properties, so all three properties may be legitimately assumed; but to prove that *F*_{1}+*F*_{2} is a fnord too, it suffices to prove *just one* of the properties, since the other two come along for free, thanks to the Theorem.

#4. It hasn’t escaped my attention that, in a sense, letting the needs of mathematicians dictate the definitions mathematicians use is not entirely different from the way people let their verdicts on issues determine the definitions they use. People who condone abortion will define life one way, while people who condemn it will use a different definition. In a similar way, number theorists who value the unique factorization of numbers into primes will want to deny primeness to 1, while number theorists who couldn’t care less about unique factorization will — wait a minute, there are no number theorists like that! At least, none that I know of. The fact that number theorists call this result the Fundamental Theorem of Arithmetic tells you right away that there’s unanimity on that point. But, hypothetically, if there were a community of mathematicians who wanted to consider 1 to be prime, it wouldn’t cause a huge rift; we’d just need to introduce a second term, maybe “prome” or “primish”, to carry the variant meaning.

#5. The recently deceased mathematician John Tate, who laid much of the early groundwork for Scholze’s work over half a century ago in one of the most revolutionary doctoral theses of all time, was glumly resigned to the difficulty of conveying to non-mathematicians what he did for a living or why it mattered. His obituary quotes him as saying:

Unfortunately it’s only beautiful to the initiated, to the people who do it. It can’t really be understood or appreciated on a popular level the way music can. You don’t have to be a composer to enjoy music, but in mathematics you do. That’s a really big drawback of the profession. A non-mathematician has to make a big effort to appreciate our work; it’s almost impossible.

#6. I’m not actually aware of any mathematician making a crucial discovery during their honeymoon, but I’d bet it’s happened; I only hope that the mathematician had the restraint to wait until the end of the honeymoon before starting to write it up.

**REFERENCES**

Gilead Amit, “The shape of numbers”, posted as “‘Perfectoid geometry’ may be the secret that links numbers and shapes“, April 25, 2018.

Kevin Buzzard, “A computer-generated proof that nobody understands”, posted July 6, 2019.

Michael Harris, “Is the tone appropriate? Is the mathematics at the right level?”, posted around June 1, 2018.

Michael Harris, “The perfectoid concept: Test case for an absent theory“.

Frank Quinn, “A Revolution in Mathematics? What Really Happened a Century Ago and Why It Matters Today”, Notices of the American Mathematical Society, January 2012.

]]>

First I’ll play nice. I’m thinking of an infinite binary sequence that begins 0101101010110101… How will you guess the next bit? My infinite sequence happens to repeat with period seven, but if you didn’t know that ahead of time, what sort of bit-prediction method would you use? More importantly, how would you get a computer to predict successive bits and learn from its mistakes? This kind of question is relevant to data-compression.

A simple-minded but curiously effective general procedure for predicting the next bit involves looking at the different-length *suffixes* of the currently-known part of the sequence, where the suffix of length *k* consists of the last *k* bits, and looking at earlier occurrences of those particular patterns earlier in the sequence. For instance, the suffix of length 3 in 0101101010110101 is the pattern 101. This pattern of bits has occurred earlier in the sequence (several times, in fact). A longer suffix is 0101, which has also occurred earlier in the sequence. The curiously effective procedure for predicting the next bit requires that you first identify the *longest* suffix that has occurred earlier in the sequence. That suffix happens to be 010110101:

0101101**010110101 **(suffix of length 9)

**010110101**0110101 (same pattern, seen earlier)

Call the longest previously-seen suffix S. The curiously effective procedure prescribes that, having found S, you must locate the previous (i.e., second-to-last) occurrence of S. Then you must see what bit occurred immediately after the previous occurrence of S:

**010110101**__0__110101

In this case, that bit is a 0, so you guess that the next bit will be 0.

0101101**010110101**__0?__

This guess is right, and that’s no accident: as long as I’ve picked a *periodic* sequence of bits (and I did), your use of this simple-minded method guarantees that you’ll guess all bits correctly from some point on, even if you don’t know what the period is.

Okay, now it’s time for me to stop playing nice. I’m thinking of an infinite sequence whose first sixteen terms are 0100110101110001. (Spoiler: it’s called the Ehrenfeucht-Mycielski sequence, and you can watch me construct the first dozen terms in a Barefoot Math video I just posted.) What’s your guess for the seventeenth term? The longest suffix that’s occurred previously is 001,

01**001**10101110**001**

and the previous occurrence of 001 was followed by a 1,

01**001**__1__0101110001

so if you follow the procedure described above you’ll guess that the next bit is 1.

0100110101110**001**__1?__

“Wrong,” I say; “the next bit is 0. This really isn’t your day; that’s the seventeenth straight time that you’ve been wrong!”

I’m being mean to you, but I’m not changing my mind about the sequence as I go; what I had in mind from the start was the infinite sequence of bits that will make all your guesses wrong. (I can do this because I know what prediction method you’re using; this enables me to front-load all my meanness and just go on autopilot after starting the game.) This sequence was invented by Andrzej Ehrenfeucht and Jan Mycielski in 1992, and is described in a nice 2003 article by Klaus Sutner. (I’ve cut some corners in my explanation; to really do it properly, we need to allow the empty string to be considered the “suffix of length 0”. The Wikipedia page gives a more rigorous treatment.)

It’s believed that the sequence exhibits “normality“: that is, it’s believed that half of the bits are 0s and half are 1s, that the four patterns 00, 01, 10, and 11 each appear a quarter of the time, that the eight patterns 000, 001, …, 110, and 111 each appear one-eighth of the time, etc. What’s more, the sequence seems to have distinct eras, where during the *m*th era the sequence is “trying” to make sure that each of the 2* ^{m}* different possible patterns of length

Or at least, the discrepancy *appears* to be smaller. Sutner’s simulations ran for millions of steps, and it’s possible for us to go farther now, but mere calculation can never tell us what really happens out near infinity where the trains don’t run. Mathematicians believe that for all large enough *n*, the number of 0’s in the first *n* bits of the Ehrenfeucht-Mycielski sequence differs from *n*/2 by less than sqrt(*n*) divided by one million. In fact, they believe that the preceding sentence remains true if you replace “million” by “billion” or any larger number you like (though the meaning of “large enough” will need to be adjusted accordingly). But: not only have they *not* proved this, they don’t even know how to prove that this claim is true if sqrt(*n*)/1,000,000 is replaced by the much larger number *n*/1,000,000. They haven’t proved that the asymptotic density of 0’s (or 1’s) is 1/2. This is the notorious Ehrenfeucht-Mycielski balance problem, and it’s been an open problem for over twenty-five years.

The Ehrenfeucht-Mycielski sequence exhibits the phenomenon of quasirandomness, weaker than pseudorandomness but still quite interesting. One of the frustrations of the study of quasirandom processes is the persistent gap between what we can guess and what we can prove. As Paul Erdős said of the Collatz Conjecture, “Mathematics may not be ready for such problems.”

But that’s a defeatist attitude, and I want to end on a positive note. So let me announce here that, after much work, it’s been shown by Kieffer and Szpankowski that the asymptotic density of 0’s and 1’s in the Ehrenfeucht-Mycielski sequence, if it exists, must lie between 1/4 and 3/4.

Oh, so you think it must be easy to prove that the density exists? Guess again.

*Thanks to Bill Gasarch, Cris Moore, Joel Spencer and Klaus Sutner.*

Next month: Let Us Define Our Terms.

**REFERENCES**

“On the Ehrenfeucht-Mycielski Balance Problem”, John C. Kieffer and W. Szpankowski, https://dmtcs.episciences.org/3542/pdf

“The Ehrenfeucht-Mycielski Sequence”, Klaus Sutner; http://www.cs.cmu.edu/~sutner/papers/CIAA-Sutner-2003.pdf

]]>Obviously not. But is there a way to pack more than 4000 disks of diameter 1 into a 2-by-2000 rectangle?

Again, obviously not — except that there is a way! (See my essay “Believe It, Then Don’t” for details.) So packing problems can be tricky.

If you’re packing equal-size disks into a large region, it’s intuitive that the best way to pack them is six-around-one. This is the *hexagonal packing*, and László Fejes Tóth showed in 1940 that it’s the best way to pack the infinite plane. Here the word “best” needs to be unpacked (pardon the pun), since both the hexagonal packing and the square packing fit infinitely many disks into the plane.

The way in which the hexagonal packing beats the square packing is that the former fills about 91% of the plane (more precisely *π* sqrt(3) / 6) while the latter fills only about 79% of the plane (more precisely *π*/4). That is, the hexagonal packing has a larger *packing fraction*. To compute the packing fraction of the square packing, divide the plane into 2*r*-by-2*r* squares where *r* is the radius of the disks.

Each square has area (2*r*)^{2} = 4*r*^{2} and contains a disk of area *π**r*^{2}, so the packing fraction in each square is (*π**r*^{2}) / (4*r*^{2}) = *π*/4 (notice that the radius drops out of the formula). Likewise you can compute the packing fraction of the hexagonal packing by dividing the plane into hexagons.

What about packing spheres in three-dimensional space?

Let’s write points as triples of numbers. If we take all the points (*a*,*b*,*c*) for which *a*, *b*, and *c* are integers, no two of the points are closer than distance 1, so we can put spheres of radius 1/2 around each of them and the spheres don’t overlap. (If two points are at distance *d* from one another, spheres of radius *d*/2 centered at the two points will touch but won’t overlap.) The sphere centered at (*a*,*b*,*c*) is tangent to the spheres centered at the six points (*a±*1,*b*,*c*), (*a*,*b±*1,*c*), and (*a*,*b*,*c±*1). This gives us the *cubical packing*, and it covers about 52% (more precisely *π*/6) of 3-dimensional space.

Not bad! But it turns out that if you throw out half of the spheres and inflate the rest, you can get a bigger packing fraction.

Here’s how it works. Imagine painting those spheres red and blue, where a sphere centered at (*a*,*b*,*c*) is red if *a*+*b*+*c* is even and blue if *a*+*b*+*c* is odd. Each red sphere touches six blue spheres. If we cull the blue spheres, then no red sphere touches any other sphere, so there’s room for us to expand the radii of the red spheres. By how much? The nearest neighbors of the red sphere centered at (*a*,*b*,*c*) are now the twelve red spheres centered at (*a**±*1,*b**±*1,*c*), (*a**±*1,*b*,*c**±*1), and (*a*,*b**±*1,*c**±*1); these twelve points are all at distance sqrt(2) from the point (*a*,*b*,*c*), so we can expand the radii of the red spheres from 1 to sqrt(2)/2 and the red spheres still won’t overlap (though they will graze each other). This is a win because the volume of a sphere grows like the cube of radius. That is, even though the packing fraction went down by a factor of 2 when we culled the blue spheres, it went up by a factor of (sqrt(2))^{3} = 2 sqrt(2) > 2 when we inflated the red spheres. So our new culled-and-inflated packing has a bigger packing fraction (bigger by a factor of sqrt(2)), namely *π *sqrt(2) / 6, or about 74%.

Is this packing the best possible? Johannes Kepler thought it was, and for centuries, nobody could find a better packing but nobody could prove that there wasn’t one. It wasn’t until 2005 that Thomas Hales published a proof that Kepler’s packing fraction can’t be improved.

What about packing spheres in four-dimensional space? What would that even mean? If we relinquish visualization and rely on analogy, we can just *define* four-dimensional space as the set of quadruples (*a*,*b*,*c*,*d*) of real numbers, and *define* the distance between two quadruples (*a*,*b*,*c*,*d*) and (*a’*,*b’*,*c’*,*d’*) as

and so on. Then we can use the same trick that worked in three dimensions. Take a hypersphere centered at each point (*a*,*b*,*c*,*d*) where *a*,*b*,*c*,*d* are integers, and paint it red or blue according to whether *a*+*b*+*c*+*d* is even or odd. If we cull the blue hyperspheres and inflate the red hyperspheres by a factor of sqrt(2), we get a packing that’s exactly twice as dense as the 4-dimensional hypercubical packing. This is called the *D*_{4} packing, and it’s believed to be optimal, but nobody has proved it.

You might think you can guess how the rest of the story goes: 5-dimensional packing is even harder than 4-dimensional packing, 6-dimensional packing is harder still, and so on, forever. But no! That’s the thing I find most amazing about this story. For two special values of *n*, mathematicians have been able to prove that the densest *known* way to pack *n*-dimensional hyperspheres is in fact the densest *possible* way. These exceptional dimensions are *n*=8 and *n*=24. The proof for *n*=8 was found by in 2016 by Maryna Viazovska, and the proof for *n*=24 was found a week later by Viazovska in collaboration with Henry Cohn, Abhinav Kumar, Stephen Miller, and Danylo Radchenko. Erica Klarreich’s article is a great place to look if you want to know more. Or check out Kelsey Houston-Edwards’ Infinite Series video about sphere-packing.

One thing I love about this topic is that despite the sophistication of the methods used by Viazovska and her collaborators, the subject is still in its infancy. The only dimensions we understand right now are 1, 2, 3, 8, and 24. I’m guessing that the 4-dimensional case is the one somebody will solve next, but who knows?

Next month: Guess Again: The Ehrenfeucht-Mycielski Sequence.

]]>A pleasant dream has been replaced by your worst nightmare. But into your still-sleep-fogged conscious mind rises a catchphrase, your only chance for salvation. “Um… Take the derivative and set it equal to zero?”

The professor beams and tells the class “You see? It’s so easy, you can even do it in your sleep!”

This event took place nearly forty years ago (my friend Dan Ullman was a teaching assistant and saw the whole thing, though the function being maximized and the student’s dream were probably different). The story highlights one feature of doing “cookbook-calculus”: you can survive just following the recipes.

But what I want to talk about is what happens next in that classroom, and its broader significance. Because when you differentiate the function 6*x* − *x*^{3} you get the function 6 − 3*x*^{2}, and when you set this expression equal to 0 and simplify you get *x*^{2} = 2 — an equation that has no solution in the rational numbers.

And the square root of two is nice compared to some of the other beasts that await you in calculus class. If you take the derivative of 10^{x}, you get 10^{x} times the irrational number ln 10. Or if you take the derivative of the sine of *x* (where the angle *x* is expressed in degrees), you get the cosine of *x* times the irrational number *π*/180.

You can get these annoying multipliers to go away, but at a steep price: banishing the former requires accepting the irrational number *e *as the One True Base, while banishing the latter means worshipping at the altar of the Arch-Irrational, *π*. (I jest, of course.)

Why does calculus involve so many irrational numbers? A course in theoretical calculus (also called real analysis) can shed light on the issue. One digs underneath the function-concept to re-explore the number-concept and one finds that the properties of the number-line discussed in high school simply don’t suffice to get the calculus-motor up and running. One needs an extra ingredient in the fuel, called the completeness property of the reals. To paraphrase Obi-Wan Kenobe: The completeness property of the reals is what gives calculus its power. It surrounds the set of real numbers and penetrates it. It binds the number line together.

The real number system, with its plethora of nasty irrational numbers, possesses the completeness property; the tidier rational number system lacks it.

Using the completeness property we can prove the Intermediate Value Theorem: If a continuous function is positive somewhere and negative somewhere, then it’s got to be zero somewhere. Invoking this theorem, we can prove that sqrt(2) exists, since the continuous function *x*^{2} − 2 is positive for some values of *x* and negative for others.

Likewise, the completeness property lets us prove the Extreme Value Theorem: A continuous function on a (closed, bounded) interval must achieve its maximum value somewhere. So there must be a place at which the continuous function 6*x* − *x*^{3} achieves its maximum.

If we tried to do calculus using just rational numbers, neither the Intermediate Value Theorem nor the Extreme Value Theorem would be true: for, if we restrict *x* to rational values, *x*^{2} − 2 is sometimes positive and sometimes negative but never zero, while 6*x* − *x*^{3} approaches but never achieves a maximum value. (See this Stack Exchange discussion to learn more about what goes wrong with calculus when you try to use only rational numbers.)

There are other theorems in theoretical calculus that work for the real number system but fail for the rational number system: the Mean Value Theorem, the Bolzano-Weierstrass Theorem, the Heine-Borel Theorem, etc.

And now I can get to the real point of this essay. The theorems of the calculus that work for the reals but fail for the rationals, despite the very different claims they make, are all **equivalent to each other! **That is, if you’re dissatisfied with the rational number system and are looking to upgrade, and you go to the number-system store and say you want a number system that has *this* feature or *that* feature, most of the time the sales clerk will reply “Sounds like you want to buy the real numbers.” It’s a package deal; for the price of one feature, you get them all. (See this article to learn more about calculus theorems that are equivalent to the completeness property.)

As I mentioned last month, my favorite theorem-secretly-equivalent-to-completeness is the Constant Value Theorem: it says that if the derivative of a function is 0 everywhere, then the function must be constant. Obvious, right? But if you try do calculus using the rational numbers in place of the real numbers as your “everywhere”, the theorem fails. To see why, look at the function *f*(*x*) = sign(*x*^{2} − 2): it’s +1 where *x*^{2 }> 2 and it’s −1 where *x*^{2 }< 2 (no fair asking “What about where *x*^{2} = 2?” since we’re restricting to rationals).

In ordinary calculus the function *f*(*x*) is discontinuous at *x* = sqrt(2), but if we’re restricting *x* to be rational, the function is continuous throughout its domain. In fact, the function is differentiable with derivative 0 throughout its domain. So the truth of the Constant Value Theorem depends on (and in fact is equivalent to) a subtle property of the real numbers that we don’t teach in most calculus classes but which makes calculus work.

Before I end I should probably subvert the jokey title of this essay a bit. The incursion of irrational numbers into mathematics long predates calculus. The ancient Greeks noticed that if you try to measure both a side and a diagonal of a square using the same measuring-stick, you’ll run into trouble. If you use half-a-side as your measuring unit, the diagonal isn’t quite three measuring units long, though the approximation isn’t bad; that’s because the diagonal is slightly less than 3/2 as long as the side. Likewise, if you use a-tenth-of-a-side as your unit, you’ll find that the diagonal is just a hair over fourteen units long; that’s because the diagonal is slightly more than 14/10 as long as the side. There’s no length-unit that does the job exactly, because there’s no fraction whose square is exactly two.

But the Greek mathematician Eudoxus of Cnidus realized that even though the question “For what whole numbers *m* and *n* is the diagonal of a square *m*/*n* times as long as the side of the square?” has no right answer, *each wrong answer is* *wrong in a consistent way, regardless of which square you’re looking at*. That is: whether the square you’re looking at is big or small, the ratio 3-to-2 is always too big and the ratio 14-to-10 is always too small; and so on. Eudoxus realized that being able to *compare* the square root of two with ordinary numbers like 3/2 and 14/10, and decide in each case which approximations are too big and which are too small, allows us to pin it down with whatever precision we want, and that this ability to compare is good enough not just for all practical purposes, but for all theoretical purposes as well.

Eudoxus’ solution to the problem of irrational numbers (or, expressed in less anachronistic terms, the problem of incommensurable magnitudes) is so satisfactory that most mathematicians dislike the way the mathematical sense of the word “irrational” chafes against its everyday meaning (at least in the languages I’m familiar with). Given that the word “irrational” means “unreasonable”, it’s reasonable for nonmathematicians to infer that irrational numbers must be resistant to human reason, but that natural surmise is a couple of millennia out of date. A number like the square root of two is only irrational in the sense that it can’t be expressed as a ratio of whole numbers (ir-ratio-nal; get it?). Likewise, calculus isn’t beyond the power of the human mind, though we humans are still working on the problem of how to teach it well to junior members of the species.

But calculus does have many surprises, not least of which is how certain irrational numbers like *π* keep popping up as the answer to many seemingly unrelated questions. Perhaps the most surprising thing about calculus, and about mathematics more broadly, is what physicist Eugene Wigner called the “unreasonable effectiveness” of mathematics in describing our universe. To plumb that mystery, we’ll need to find a vista from which we can have a clearer picture of what a different, less reasonable kind of universe might look like. Reason alone won’t take us there.

Next month: Sphere-Packing.

]]>The celebration is long overdue.^{1} Calculus is one of the triumphs of the human spirit, and a demonstration of what perfect straight things (and perfect curvy things) can be made from the crooked timber of humanity. It’s given us a way of seeing order amidst the variety and confusion of reality, hand-in-hand with the expectation that when things happen, they happen for a reason, and that when surprising things happen, it’s time to look for new forces or additional variables.

One of my favorite theorems is a calculus theorem, but it’s not a theorem anyone talks about very much. It may seem mundane (if you’re mathematically sophisticated) or silly (if you’re not). It’s seldom stated, and when it *is* stated, it’s a lowly lemma, a plank we walk across on the way to our true destination. But it’s a crucial property that holds the real number line together and makes calculus predictive of what happens in the world (as long as we stay away from chaotic and/or quantum mechanical systems). It’s called the Constant Value Theorem, and it can be stated as a succinct motto: “Whatever doesn’t change is constant.” (This is not to be confused with the motto “Change is the only constant”, which happens to be the title of Orlin’s book.) I’ll tell you four things about this theorem that I find surprising and beautiful.

I spoke about the Constant Value Theorem earlier this month on the My Favorite Theorem podcast, so you might want to listen to that episode before or after you read this essay. In a playful vein, I suggest that if the Constant Value Theorem weren’t true, we’d live in a scarily unpredictable universe in which objects could jump around or change course willy-nilly. But it would be more accurate to say that without the Constant Value Theorem, calculus wouldn’t have the predictive power that’s made it such a successful tool in helping us model and master our world.

**THE CELESTIAL ORRERY**

Let’s start the story with mathematician Pierre-Simon Laplace. Laplace had a vision of the universe as a giant orrery, running on invisible tracks laid out by Newton’s laws. He wrote:

*Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective positions of the beings which compose it, if moreover this intelligence were vast enough to submit these data to analysis, it would embrace in the same formula both the movements of the largest bodies in the universe and those of the lightest atom; to it nothing would be uncertain, and the future as the past would be present to its eye.*

Newton’s laws, as understood by Laplace, are expressed as equations describing the way various quantities change over time. Let *f*(*t*) represent some time-dependent quantity, where *t* represents time; then *f’*(*t*) (called the derivative of *f*(*t*) with respect to time) is defined to be the rate at which *f*(*t*) is changing at time *t*. For instance, if *f*(*t*) represents the position of some object at time *t*, then *f**‘*(*t*) represents the rate at which the position is changing at time *t*, better known as the velocity of the object at time *t*, and *f”*(*t*) (the derivative of the derivative of *f*(*t*), usually called the second derivative of *f*(*t*)) represents the acceleration of the object at time *t*. Differential equations are equations that constrain *f*(*t*) by providing information about *f**‘*(*t*) (and sometimes *f”*(*t*) and higher derivatives too), and what Laplace meant by subjecting data to analysis is solving differential equations. Typically a differential equation, considered in isolation, has an infinity of solutions, all representing possible ways a system could evolve, but by specifying the initial state of the system, we can eliminate all but one of those solutions. That’s what physicists do for simple subsystems of the universe, and what Laplace’s hypothetical intelligence can do for the universe as a whole.

Among differential equations, none is simpler than *f’*(*t*) = 0, expressing the relation that as the quantity *t* changes, the quantity *f*(*t*) doesn’t change at all; a basic step in solving many problems in math and physics is passing from the assertion that *f’*(*t*) = 0 for all *t* to the conclusion that there exists some *c* such that *f*(*t*) = *c* for all *t*. The Constant Value Theorem is what gives us the right to draw this conclusion.

**The Constant Value Theorem**: If the function *f* doesn’t change (that is, if *f**‘*(*t*) = 0 for all real numbers *t*), then *f* is constant (that is, there exists *c* such that *f*(*t*) = *c* for all real numbers *t*).

Usually the Theorem is applied in interesting ways, for instance in talking about the conservation of the total energy in a pendulum, expressed as the sum of the (separately non-conserved) kinetic energy and potential energy. But the Theorem also makes sense in less physically interesting systems. For instance, if *f*(*t*) signifies the position of a particle at time *t*, so that *f**‘*(*t*) signifies the velocity of the particle at time *t*, then the logical leap from “*f’*(*t*) = 0 for all *t*” to “There exists some *c* such that *f*(*t*) = *c* for all *t*” amounts to the assertion that if an object has velocity 0 for all time, then it has some particular location for all time.

I’m sure you agree that there’d be something very wrong with physics, or at least with the concept of velocity, if an object could have velocity 0 for all time but fail to have constant position. Part of what we mean (or intend to mean) by saying that something has velocity 0 for all time is that it stays put! And that’s what the Constant Value Theorem tells us.

**THREE SURPRISES**

The mathematical proof of the Constant Value Theorem is surprisingly subtle. When you start to develop calculus rigorously from first principles, using precise definitions and ironclad logic, it turns out to be impossible to derive the Constant Value Theorem from the definition of the derivative using only the apparatus of high-school algebra. It’s very easy to prove the converse, namely, that if a function is constant then its derivative is zero. But in order to prove the Constant Value Theorem itself, we need to invoke what’s called the completeness property of the real number system, which informally speaking asserts that the number line has no gaps in it. This property didn’t emerge until over a century after Newton and Leibniz invented calculus. The fact that such a simple assertion is true but not easy to prove is, for me, the first surprising thing about the Constant Value Theorem.

The second surprising thing about the Constant Value Theorem is that it isn’t just a *prototype* for the kinds of theorems Laplace’s intelligence would depend on; it actually provides *proofs*. For instance, consider the theorems “If *f”*(*t*) = 0 for all *t*, then *f*(*t*) is a linear function” and “If *f’*(*t*) = *f*(*t*) for all *t*, then *f*(*t*) is an exponential function”. Both of them have a family resemblance to “If *f’*(*t*) = 0, then *f*(*t*) is a constant function”, so you might guess (correctly) that they can both be proved by mimicking the proof of the Constant Value Theorem, that is, by appealing to the completeness property of the reals. But what’s not so obvious is that you can prove these more complicated results without appealing directly to the completeness property of the reals at all, just by invoking the Constant Value Theorem itself in clever ways! (Endnotes #2 and #3 give examples, for those of you who know calculus.) The Constant Value Theorem is what makes Laplace’s vision viable. You might say that if Newton’s laws are the rails on which Newton’s universe runs, then the Constant Value Theorem is what keeps the universe from jumping the rails.

The third surprising thing about the Constant Value Theorem is that it isn’t just a *consequence* of the completeness property of the reals; it *implies* the completeness property of the reals. We normally think of the flow of implication in mathematics as being unidirectional: from the axioms we deduce lemmas, from the lemmas we deduce theorems, and from the theorems we deduce corollaries. I like to think of the implicational structure of calculus as something like a tree, and to envision the Constant Value Theorem as a piece of fruit hanging from a twig hanging from a branch hanging from a bigger branch growing out of the trunk of a tree that grew out of a seed containing the completeness axiom. Just as the fruit of a tree contains the DNA of the seed from which the tree grew, this theorem bears within itself the axiom from which it arose.^{4} I find that a satisfyingly organic state of affairs.

**THE FOURTH SURPRISE**

So far I’ve been talking about differential calculus, and have even been calling it, simply, “calculus”, as if there were no other kind. But… There’s another flavor of calculus that’s not the kind you meet in the classroom in your school. It’s the *other* flavor of calculus; it’s sometimes called “discrete”, and the person who devised it was a fellow named George Boole. This is the kind of calculus we use when we’re talking about processes that evolve in discrete time rather than continuous time. Examples of such processes that you’ve probably seen before include the doubling sequence 1, 2, 4, 8, 16, … and the Fibonacci sequence 1, 2, 3, 5, 8, … The first sequence, viewed as a function of time, satisfies *f*(*t*+1) = 2 *f*(*t*), while the second satisfies *f*(*t*+2) = *f*(*t*+1) + *f*(*t*).^{5}

The theory governing such equations, called discrete calculus or the difference calculus or the calculus of finite differences, is remarkably parallel to the theory governing differential equations. (If you’ve never seen it before, one place to learn the basics is Christopher Catone’s recent article, listed in the References.) In both setups, we find that the equations we want to solve, taken in isolation, typically have infinitely many solutions; for instance, any sequence of the form *c*, 2*c*, 4*c*, 8*c*, 16*c*, … is a solution to the equation *f*(*t*+1) = 2 *f*(*t*). To single out the solution we’re interested in, we have to make use of initial conditions (in this case, the initial condition *f*(0) = 1). In both the continuous and discrete settings, special equations called characteristic equations guide us toward the solution, and things are more complicated when the characteristic equation has repeated roots. The parallels go on and on.

I teach a course on discrete mathematics for computer scientists, and I train my students to solve discrete recurrence equations like the two mentioned above because the computer science department thinks the skill is important; but I feel a bit funny doing this, because nowadays computers can solve a broad class of problems like this on their own, and they’re better at it than people are, just as modern computers are better at solving differential equations than people are. (Did I mention that the George Boole who came up with discrete calculus is the same George Boole who came up with the Boolean logic that underlies the digital computer?)

But here’s where the main subject of this essay makes a dramatic return to the stage: when computers solve problems in discrete calculus, they often make implicit or explicit use of the principle that says that if a sequence of numbers isn’t changing from one term to the next, then that sequence must be constant. That is, if *f* is some function of discrete time satisfying *f*(1) − *f*(0) = 0 and *f*(2) − *f*(1) = 0 and *f*(3) − *f*(2) = 0 and so on, then there must be some constant *c* so that *f*(*t*) = *c* for all whole numbers *t*. This is precisely the discrete analogue of the Constant Value Theorem!^{6}

**The Discrete Constant Value Theorem**: If the sequence *f* doesn’t change (that is, if *f*(*t*+1) − *f*(*t*) = 0 for all whole numbers *t*), then *f* is constant (that is, there exists *c* such that *f*(*t*) = *c* for all whole numbers *t*).

What I find surprising (and deeply satisfying) here is that even though continuous mathematics and discrete mathematics are often taught as totally different subjects, there are deep affinities, such as the way the two different Constant Value Theorems can be seen as playing such deep (albeit hidden) roles in both subjects. So it’s not just that calculus is one tree and discrete mathematics is another tree, each bearing fruit containing the DNA of the seed from which that fruit came; these two trees somehow converse with one another, as part of a beautiful unified ecosystem. It’s fun to listen in on what they say.

*Thanks to William Gasarch, Sandi Gubin, Kevin Knudson and Evelyn Lamb.*

Next month: Calculus is Deeply Irrational.

**ENDNOTES**

#1. If there’s Math Awareness Month and Pi Day, why not Calculus Week, or at least Calculus Weekend?

#2: The Constant Value Theorem says that if *f’*, the derivative of the function *f*, has the property *f’*(*t*) = 0 for all *t*, then *f*(*t*) is constant. But what can we conclude about *f*(*t*) if what we know is that *f”*(*t*) = 0 for all *t*?

The Constant Value Theorem, applied to *f’*, tells us that if *f”*(*t*) = 0 for all *t*, then *f’*(*t*) must be constant. Let’s give that constant the name *c*_{1}, so that *f’*(*t*) = *c*_{1} for all *t*, and let’s take *g*(*t*) = *f*(*t*) − *c*_{1} *t.* (Why? Because I’ve seen this trick before!) We can apply the Constant Value Theorem again, this time to *g*(*t*). The rules of differentiation give us *g*‘(*t*) = *f**‘*(*t*) − *c*_{1}, but we know that *f’*(*t*) = *c*_{1} for all *t*, so *f’*(*t*) − *c*_{1} = 0 for all *t*, so *g’*(*t*) = 0 for all *t*. This implies (by the Constant Value Theorem) that *g*(*t*) must be constant. Let’s give that constant the name *c*_{0}. Then we have *f*(*t*) − *c*_{1} *t* = *g*(*t*) = *c*_{0} for all *t*, so *f*(*t*) = *c*_{1} *t* + *c*_{0} for all *t*. Now you see why I chose to call that second constant *c*_{0} rather than *c*_{2}; *c*_{1} is the coefficient of *t*^{1} and *c*_{0} is the coefficient of *t*^{0} in the linear polynomial *c*_{1} *t* + *c*_{0}.

Analogously, applying the Constant Value Theorem three times tells us that if *f**”’*(*t*) = 0 for all *t*, then *f*(*t*) can be expressed in the form *c*_{2} *t*^{2} + *c*_{1} *t* + *c*_{0}. This has implications for Newtonian ballistics. If a projectile is traveling through a uniform vertical gravitational field, it’s undergoing constant downward force, and by Newton’s law *F* = *ma* relating force to acceleration, the projectile is undergoing constant downward acceleration. Since the derivative of a constant is 0 (that’s the easy converse of the Constant Value Theorem), the time-derivative of the downward acceleration must be 0, so (because of what I said in the first sentence of this paragraph), the vertical coordinate of the projectile must be given by a quadratic function of time. Meanwhile, the horizontal coordinate of the projectile must be given by a linear function of time, since its second derivative is zero (on account of the fact that there’s no force acting in the horizontal direction). This explains why the motion of the projectile follows a parabola. And it all follows from the Constant Value Theorem.

#3: One of the simplest differential equations that takes us out of the realm of polynomial functions is the differential equation *f’*(*t*) = *k f*(*t*); that is, the function *f*(*t*) is proportional to its own derivative. We teach students that all such functions are of the form *ce*^{kt} for some *c*, and they apply this to problems involving exponential growth or decay. But why are these functions the only solutions to this differential equation? As in Endnote #2, the Constant Value Theorem, applied with a certain amount of cleverness, provides the proof. Consider the auxiliary function *g*(*t*) = *f*(*t*) *e*^{−kt}. The rules for taking derivatives tell us that *g’*(*t*) = (*f’*(*t*)) (*e*^{−kt}) + (*f*(*t*)) (−*k* *e*^{−kt}) = (*f’*(*t*) − *k f*(*t*)) *e*^{−kt}, which equals 0 for all *t* if *f’*(*t*) = *k f*(*t*) for all *t*. The Constant Value Theorem now tells us that *g*(*t*) must be a constant; call that constant *c*. Then the equation *f*(*t*) *e*^{−kt} = *c*, rearranged, gives *f*(*t*) = *c* *e ^{kt}*, as claimed.

#4: The enterprise of reversing the implicational flow of ordinary mathematics, and deriving axioms from theorems instead of the other way round, is called reverse mathematics. More precisely, in reverse mathematics we typically focus on one axiom in some axiomatic system, and show that, if all the other axioms are taken as true, then the singled-out axiom is logically equivalent to some theorem far out in the crown of the tree of implications. One of my favorite examples comes from Euclidean geometry. If one assumes that the first four of Euclid’s five axioms are true, but reserves judgment about the fifth (the most problematic of the five, known as the parallel postulate), then one can show that the parallel postulate is equivalent to the proposition that “A square exists”, by which I mean the proposition that there exists a quadrilateral with four equal sides and four right angles. It’s not surprising that in the presence of the first four axioms, the fifth axiom enables you to prove the existence of a square; what I find delightful is that in the presence of the first four axioms, the existence of a square (just one, anywhere in the plane!) allows you to deduce the parallel postulate.

During a period in my career when I taught Honors Calculus semester after semester, I got interested in figuring out which of the theorems I was teaching my students were equivalent to the completeness axiom (in the presence of all the other axioms governing the real numbers), and I wrote an article about it. I should say, by way of scholarly precision, that the kind of reverse mathematics I was practicing in this article is a much blunter affair than what most people call the reverse mathematics of the real numbers; I skirted over a whole lot of niceties that logicians rightly care about.

#5. In most textbooks on discrete mathematics one writes the equation *f*(*t*+2) = *f*(*t*+1) + *f*(*t*) as *f*_{n+2} = *f*_{n+1} + *f*_{n} , where the use of *n* rather than *t* highlights the discrete nature of the independent variable, and where the use of subscripts rather than arguments in parentheses accords with historical precedent; but this is mere notation, and it can distract us from the underlying similarity between the two contexts.

#6. There’s a great pedagogical opportunity in the teaching of integrals that, as far as I know, no calculus textbook author has ever exploited. When the integral is introduced via Riemann sums, we often show students formulas like 1^{2} + 2^{2} + 3^{2} + … + *n*^{2} = *n*(*n*+1)(2*n*+1)/6 that come into the story. This is exactly the sort of problem that the difference calculus was designed for. Yet instead of showing the students how the difference calculus gives solutions to these problems, we show them a method of proof called mathematical induction. Don’t get me wrong, I love induction, and I wouldn’t dream of teaching a discrete mathematics course without covering it. But it’s foreign to the spirit and methods of calculus, whereas the discrete Constant Value Theorem is the discrete counterpart of something students get to see when they learn about derivatives. This lost opportunity for reinforcing the conceptual unity of calculus has at times led me to half-seriously call for banning the practice of teaching proof by induction in calculus courses, and I’ve even given some talks on that theme. Just as the continuous Constant Value Theorem can serve as a replacement for other completeness properties of the reals, the discrete Constant Value Theorem can serve as a replacement for the axiom of induction. And the principle “What doesn’t change is constant” is part of the way computers “think” about such problems. So maybe we should teach more of our students to think that way too.

**REFERENCES**

David Bressoud, Calculus Reordered: A History of the Big Ideas.

Christopher Catone, Bringing calculus into discrete math via the discrete derivative, College Mathematics Journal, January 2019 (Volume 50, issue 1).

Ben Orlin, Change is the Only Constant: The Wisdom of Calculus in a Madcap World.

James Propp, Real Analysis in Reverse. Published in The American Mathematical Monthly, Vol. **120**, No. 5 (May 2013), pp. 392-408.

James Propp, Bridging the gap between the continuous and the discrete (slides for a talk given in 2013).

James Propp, Don’t teach mathematical induction! (slides for a talk given in 2014). There was supposed to be a video of my talk, but unfortunately the movie-file appears to be audio-only.

James Propp, Calculus is Deeply Irrational (an essay I submitted to the 2019 Big Internet Math-Off)

Steven Strogatz, Infinite Powers: How Calculus Reveals the Secrets of the Universe.

]]>

I’m a pure mathematician with no background in applied mathematics. But lately I’ve been striving to make a name for myself in the less-crowded field of *mis*-applied mathematics, and bogus science more broadly.

Now you may be asking yourself, is bogus science really less crowded a field than good science? After all, if Sturgeon’s law (“Ninety percent of everything is crap”) applies to science, then we can expect crappy science to predominate over the good kind. But bogosity transcends mere crappiness. For something to be bogus, I think there must be an attempt to deceive. Or at least, there must be the *appearance* of an attempt to deceive. Sometimes the appearance is itself a sham, and that’s the kind of second-order bogosity I enjoy practicing, when I try my hardest to act like someone who genuinely believes (and wants others to believe) a nonsensical theory.

My forum is the Festival of Bad Ad Hoc Hypotheses (BAHFest), held periodically in various locations around the world (San Francisco, Seattle, Cambridge, Sydney, and London). It’s a celebration of well-argued and thoroughly researched but completely incorrect scientific theories. BAHFest is dedicated to the proposition that no matter how absurd a premise is, you can find a way to abuse the tools of science to support your cause and make people laugh in the process. (Or make nerds laugh, anyway.)

BAHFest was the brainchild of Zach Weinersmith whose Infantapulting Hypothesis got the game going.

One of my favorite BAHFest talks (and, as it happens, another one that centers on babies) is Tomer Ullman’s “The Crying Game”.

If you have a favorite of your own, please submit it in the Comments!

Developing a BAHFest premise is a little bit like doing math, insofar as math is the game of formulating precise assumptions, based in reality or not, and seeing where they lead. The biggest difference is that in pure math, internal consistency is all that matters, whereas in science, Reality is the ultimate arbiter of value. When the science is bad, Reality says “That’s not true!” When the science is not merely bad but bogus, the scientist tells Reality “That’s what YOU say!” and shoves a bound and gagged Reality into a closet, keeping up a steady stream of patter to drown out Reality’s muffled protests. And when the bogosity is second-order bogosity, as in a BAHFest talk, the sonic overlay of the scientist’s patter and Reality’s protests is where the humor comes from. The more shameless the abuse of Reality, the funnier the talk.

**EVOLUTION**

A lot of BAHFest presentations parody evolutionary psychology, because the field lends itself so readily to parody: with a little creativity you can find evidence to support almost any Just-So Story (oops, I mean *hypothesis*) about how those funny creatures called modern humans got to be the way they are, and nobody can travel back in time to do experiments to test (and refute) your hypothesis. But if you want an example of true bogosity of the un-funny kind, I think it would be hard to top Scientific Creationism, recently rebranded as Intelligent Design theory. As a piece of mis-applied mathematics, the mathematical argument against Darwinism is hard to top (though if you know of some bogus math you find even more egregious, please post it to the Comments!).

Before addressing the mathematical “refutation” of Darwinism, let’s consider a popular debased version of it, promulgated by fiction writer Dean Koontz ^{1} (in his novel “Breathless”) and by other writers with less flair and smaller readerships. Koontz puts the anti-evolution argument in the mouth of a character named Lamar Woolsey, a mathematician specializing in chaos theory. The argument goes that even if we assume a mutation rate of one mutation per microsecond, there isn’t time in the history of our planet to allow as complicated a creature as homo sapiens to reach its current level of complexity: the number of bits of information in the genome of even the simplest of creatures exceeds the number of microseconds available for its evolution. The problem with this argument is that in truth the inequality goes the other way: the number of bits of information in the genome of even the most *complex* of creatures is dwarfed by the number of microseconds available for its evolution. Koontz’s numerical claim is just a flat-out (and, I would like to believe, unintentional) falsehood.

More interesting is the “correct” version of this argument, which hinges on the fact that the number of microseconds in the history of our planet, or even the number of nanoseconds in the history of our universe, is minuscule in comparison with the number of *possible* *genomes*, which is an exponential function of the number of bits of data in our particular genome. (The human genome has only ten billion base pairs, more or less, but the number of ten-billion base-pair genomes is roughly four to the power of ten billion, which is much, much larger.) It’s argued that there’s no way that evolution could have hit upon the recipe for making humans any more than a bunch of monkeys banging away at typewriters could have hit upon the recipe for crème brûlée.^{2}

It’s absolutely true that the number of possible genomes of the same order of complexity as the human genome is vastly greater than the number of nanoseconds that our universe has existed — absolutely true, and absolutely irrelevant. That’s because the accepted model of evolution isn’t brute-force exhaustive search or random exploration. It’s a guided search through a fitness landscape with a meaningful gradient that says “It’s probably good to go farther this way” or “It’s probably bad to go farther that way.” That is, evolution isn’t like sifting a haystack in search of a needle; it’s more like searching for an iron needle in a haystack using a metal detector. Plus, there are probably a lot more viable genomes for a human-type creature than the loosely-defined consensus we call the human genome; there could be plenty of equally good needles in the evolutionary haystack.

Dean Koontz is too canny a writer to have his mouthpiece Lamar Woolsey actually subscribe to Creationism. Instead, Lamar says that the science of evolution is not yet settled. This kind of assertion is a very popular trope in science denialism, and one of the best in its arsenal, going back to the very dawn of the Enlightenment and beyond; 17th century defenders of Church doctrine suggested that, instead of advocating heliocentrism, Galileo should teach the Ptolemy-Copernicus controversy.^{3} Perhaps the greatest milestone in the Pseudoscientific Revolution was the discovery that skepticism, in addition to serving as a tool in the pursuit of truth, could be enlisted as a source of deliberate misinformation.

Koontz’s Lamar Woolsey says: “Darwinian evolution offends me simply as a mathematician, as it does virtually every mathematician who has ever seriously thought about it” (and even asserts that “evolutionists hate mathematicians” because mathematicians point out the flaws in evolutionary theory). One thoughtful mathematician who’s taken the time to explain what’s wrong with probability-based criticisms of Darwinism is David Bailey, whose articles “Does probability refute evolution?” and “Misuse of probability by ‘creation scientists'” should be required reading for anyone who (rightly) marvels at the complexity of the living world and is inclined to conclude (wrongly) that natural processes cannot possibly account for it.

Do probabilistic arguments against Darwinism meet my somewhat stringent definition of flimflam, which requires an attempt to deceive? That depends on who’s making the argument. If it’s someone who hasn’t encountered the sort of counter-arguments David Bailey discusses so lucidly, then no — it’s just someone spreading misinformation. But if someone has seen those counterarguments and ignores them anyway, then yes, it’s flimflam.

There are critiques that can be made (and have been made) of Darwin’s original vision, but the ones I’ve seen don’t refute Darwin’s vision as much as complicate it, and they don’t point to the existence of huge explanatory gaps that can only be filled by invoking some radically different principle (such as meddling aliens, or a Flying Spaghetti Monster, or some other sort of intelligent Creator).

Of course, if you are 100% convinced that a belief in Darwinism leads to a loss of faith in God that in turn causes souls to be damned for all eternity, then it is your godly duty to continue to trot out arguments against Darwinism that you know to be bogus. But you shouldn’t pretend you’re engaged in rational debate. (Actually, you *should*, if that pretense will save souls. And if that mendacity costs you your own soul, you should still tell those lies, since the salvation of the many believers you’ll convert counts for more than the damnation of your own soul. Sacrificing yourself in this fashion would be a marvelously selfless action on your part. So selfless, in fact, that maybe it shouldn’t lead to your damnation after all! But I digress.)

If you want to prove the existence of God with probability, there’s a much more efficient way to do it, without the bother of invoking evolution. All you need is a coin. Toss the coin a hundred times and record the outcome as a length-100 sequence of H’s (Heads) and T’s (Tails). Now, according to the secular humanist theory of coin-tossing (which is only a theory, mind you!), each of the two-to-the-hundredth power different sequences of H’s and T’s is just as likely as every other^{4}; so the probability of your having obtained this *particular* sequence of H’s and T’s is only one out of two-to-the-hundredth-power, or less than .000000000000000000000000000001. Now, by the very standards that these secular humanist probabilists claim to adhere to (dare I say, the very standards that these people *worship*), such a low probability is a rigorous disproof of the hypothesis under consideration! So using statisticians’ *own methods*, we’ve ruled out the Materialist theory of coin-toss outcomes, leaving its rival, the Providential theory, as the most plausible explanation. This means that your sequence of H’s and T’s is more than just a proof of God’s existence; it’s *a message from God*. Get busy decoding it!

**HOW TO LIE WITH STATISTICS**

If probability theory lends itself to bogus forms of argument, statistics does even more so, especially if “statistics” is understood in the broadest sense and is taken to include the art of presenting data visually so as to create the visual impression that helps your case. As a teenager, I loved Darrell Huff’s book “How to Lie With Statistics”, and you can see traces of its influence in my BAHFest talk on eclipses, awe, and (of course) evolution.

In the talk, I advocate spending a quadrillion dollars to build a wall all the way around the moon. How can I present the cost in a way that minimizes sticker-shock? Why, using a logarithmic scale, of course! (And also by introducing the notion of “logarithmic dollars”, which sounds meaningful but isn’t.)

The big mathematical scam in my talk comes a little bit earlier, when I try to support my claim that there’s a causal link between human brain volume and “eclipse dosage” (defined as the fraction of the time that a spot on Earth witnesses total eclipses of the sun; it’s decreasing everywhere, albeit slowly, as the Moon gets farther and farther away from the Earth). I start with a disabling punch to people’s math anxiety by showing a slide that introduces inappropriate mathematical technology (Lagrange’s formula for polynomial interpolation) using needlessly intimidating notation.^{5} The reason I say that Lagrange interpolation is inappropriate is that I apply it in the case of just two data-points, so that “fitting with a polynomial” is just a pretentious way of saying “drawing a straight line through the points”. Of course, using just two data-points is statistically suspect from the get-go, but I make use of this ludicrously underpowered analysis in an even more bogus way, by tacitly arguing that IF the two data-simplifications (both straight lines: one for eclipses and one for brains) match up, THEN it implies a correlation between the original data. As it happens, the straight lines *don’t* match up, but I make them match up by using another trick I learned from “How to Lie With Statistics”, namely, adjusting the vertical axis to make the graph tell the story I want it to tell. (As Huff writes, “If you torture the data long enough, it will confess to anything.”)

And none of this even *mentions* my bogus conflation of correlation with causation. If I did my job as a BAHFest speaker, this didn’t occur to you while I was giving my talk, because part of the art of flimflam is distraction, and I did my best to distract.

**QUESTIONS NOBODY ASKED**

In preparation for presenting my hypothesis and having to answer questions afterwards, I tried to anticipate the questions I might be asked. My feeling is that, just as good fantasy and science-fiction writers often create more convincing worlds by coming up with extra details that never actually find their way onto the page, a BAHFest speaker can give a better talk if they really psych themselves into buying their own pitch.

As it happens, nobody asked the questions I came up with, but in case you’ve seen the video and want to know a little bit more about the “world-building” I did in advance, here are the questions nobody asked and the answers I would have given if someone had.

Q. You gave a Lamarckian hypothesis about the evolution of human intelligence, and then invoked a general “Equivalence Principle” that you said guaranteed that there’d be a more complicated Darwinian hypothesis that was essentially equivalent. But you never presented that Darwinian hypothesis. Could you present it now?

A. I was hoping nobody would ask. The answer is a bit, um, unpleasant. But since you asked: There’s natural genetic variation in skull size. When early humans with large skulls saw a total eclipse, their heads could accommodate the extra brain matter; when early humans with smaller skulls saw a total solar eclipse, their heads exploded. This severely compromised their reproductive fitness, especially if the explosion was forceful and the individual’s existing offspring were nearby at the time of the explosion. As evidence, I mention the fact that at many archeological sites we find not just intact skulls but skull *fragments*. Do we really have to dwell on this?

Q. Isn’t part of awe a feeling of humility? If so, and if humans *know* that human technology is behind making eclipses great again, wouldn’t that knowledge undermine feelings of awe and cancel out the brain trauma and all its benefits?

A. You’re 100% right. That’s why, if my plan moves forward, it’ll have to be done in secrecy. Plus, I’ll have to ask Zach to remove this video from the web, so that we can blame the sudden appearance of the lunar wall on space aliens.

Q. Have you talked to anyone in the Trump administration about this project?

A. Funny you should ask. They really liked my idea of a wall, but they didn’t like the solar energy part; they’d prefer to see first if there are fossil fuels on the moon that can be exploited. Which makes more sense than you might think, since the catastrophe that caused the KT extinction could have flung some dinosaur carcasses into outer space, and some of them could have landed on the moon. So there could be petrochemicals there.

Q. You talked about the wall having a positive effect on the lunar economy. But what about its effects on lunar society?

A. I think the wall will divide lunar society into two classes. All the rich people will live on the near side of the wall that faces the lights of Earth, the “Fun Side”, and all the poor people will live on the far side that faces interplanetary space, the “Dull Side”. And people being what they are, some of the dregs of lunar society will try to enjoy the good life by sneaking over to the Fun Side, instead of earning those perks the honest way, by choosing rich ancestors. So the wall will also serve as a barrier to illegal migration.

Q. When I was a professional football player^{6}, I was worried about brain damage; I even retired early because of concerns about football making my brain unable to do mathematics after my retirement. But now you’re telling me that brain-trauma may actually have been making me smarter? Please clarify.

A. I’m so glad you’re here tonight, John, so that I can apologize to you in person on behalf of the entire scientific community, for squandering the opportunity you presented to us during your career at the NFL. We could’ve done FMRIs, biopsies, you name it, to help us better understand the difference between the good kind of brain damage that occurs when you metaphorically bang your head against a math problem, and the bad kind that occurs when you literally bang your head against other people. You could’ve been the Phineas Gage of the 21st century, immortalized in Neurology 101 textbooks! Instead, you’re just another mathematician. So: sorr-yyy…

Q. You cited a study that showed that inflammation can promote synaptic plasticity in some situations. But isn’t there far more evidence that inflammation is *bad* for the brain?

A. Well, yes. But the science isn’t settled yet, and I think it’s fair to say that the truth is somewhere in between the extremes. In the meantime, I say we should teach the controversy.

Q. Do you really think you can build a wall around the moon for just a quadrillion dollars? I mean, a quadrillion dollars doesn’t go as far as it used to.

A. Well, remember that the raw materials are already on the moon, and the energy can come from the sun. So the main expense is getting equipment and personnel to the moon, and yes, I think a quadrillion would do it, if you’re satisfied with building a mile-high wall that would roll the eclipse clock back 28,000 years. But if we’re willing to spend a quintillion dollars, we could do so much more. We could build a hundred-mile-high wall with rotatable solar panels that could selectively block or admit the sun’s light. Then we could have eclipses where the lunar disk appears to wobble around, and the solar corona does The Wave on the periphery. Wouldn’t that be awesome? I can feel my brain swelling already just imagining it.

**WHAT NEXT?**

Why do I do BAHFest? Because it’s fun. And because, as I said at the end of my first BAHFest appearance, “In a time when real science is mistaken for fake science and bad science masquerades as good science, it is so important to make a place in this world for bad science that says that it’s bad science.”

But (hardball question) *why* is it so important? Well, maybe it does some good. Showing people examples of how science can go wrong can help reinforce the principles that keep science right. For instance, I’d like to think that if stats professors tell jokes about how football causes winter, etc., it’ll help their students avoid the mistake of confusing correlation with causation. But has anyone actually studied this?

I’m reminded of musical satirist Tom Lehrer‘s assessment of his own impact on American society; he liked to quote Peter Cook who, on the opening of his Establishment Club in London, said it was modeled on “those wonderful Berlin cabarets which did so much to stop the rise of Hitler”.

I and my BAHFest co-presenters try to make our talks funny, but underlying all these funny presentations is the somewhat depressing fact that it’s scarily easy to come up with rationales for crazy theories (witness the recent resurgence of Flat Earth-ism). People believe what they want to believe, and can be quite inventive when it comes to dismissing evidence that, taken at face value, would force them to abandon cherished beliefs. By doing this dismissing in as shameless a way as possible, I tried to amuse an audience for ten minutes, and the video is out there on the internet. And I’d like to believe that BAHFest talks help make people more skeptical. But somehow, for all my ability to entertain outlandish ideas, I can’t quite make myself believe that.

And yet, giving a BAHFest talk is fun, so I’m gonna do it again. I’ll submit another deliberately bogus hypothesis next year. And no, I won’t tell you ahead of time what it is.

Next month (July 17): My Favorite Theorem.

*Thanks to Sandi Gubin, Tom Knight, Christian Lawson-Perfect, Henry Picciotto, Ben Orlin, Evan Romer, Kelly Weinersmith, and Zach Weinersmith.*

**ENDNOTES**

#1. I actually enjoy Koontz enormously and have a lot of admiration for his storytelling chops. His early work tends toward purplitude and overuse of certain words and tropes, but over the years he’s shed the distracting tics of his early work and emerged as a writer of surprisingly broad interests. Unfortunately sometimes the depth of his understanding doesn’t match the breadth of his interests. One webpage I’ve seen suggests that Koontz got his ideas about evolution from one Robert Webster Kehr, who not only claims that mathematics disproves evolution but also denies the existence of photons.

#2. In addition to loving the taste of crème brûlée, I love the spelling, with one accent mark per syllable. “Brûlée” is one of my favorite menu-participles, right up there with “drizzled”.

#3. For more on Galileo and other figures in the Scientific Revolution, the different forms of push-back they had to contend with, and the strategies they came up with, see Robert Crease’s book, listed in the References.

#4. If you haven’t taken a course in probability, you might be inclined to think that the different possible sequences aren’t equally likely; for instance, you might think that if you toss a coin four times, the outcome **HHHH** is less likely than the outcome **HTHT**. In that case, give it a try! If you perform the experiment often enough, you’ll find that both of these outcomes tend to occur about one-sixteenth of the time.

#5. Here’s the formula:

Sure, the polynomial that fits the data points depends on the data points being fitted, but is it really necessary to include the data points as subscripts in the notation? (Yes, if one’s goal is to intimidate!)

#6. John Urschel was in fact one of the judges at BAHFest East 2019 (and I knew ahead of time that he would be), so the specificity of that imaginary Q-and-A exchange wasn’t an accident. He and his wife have written a book about his dual career.

**REFERENCES**

David Bailey, “Does probability refute evolution?” (last revised 2019).

David Bailey, “Misuse of probability by ‘creation scientists'” (2009).

Robert Crease, “The Workshop and the World”, 2019.

]]>Here’s what Gardner says about The Brain:

*It consists of a tower of eight transparent plastic* *disks that rotate horizontally around their centers. The disks are slotted, with eight upright rods going through the slots. The rods can be moved to two positions, in or out, and the task is to rotate the disks to positions that permit all the rods to be moved out. The Gray code supplies a solution in 170 moves.*

To find out about the Gray code, you can read Joseph Malkevitch’s September 2008 article “Gray codes“.

It’s interesting to compare speed-solving The Brain with speed-solving Rubik’s cube (check out the new Wired Magazine video about the latter!). The two solving processes might seem similar to an uninformed observer who only sees flying fingers and hears clacking plastic, but in terms of the mental processes involved, the tasks couldn’t be more different. At every stage in the process of solving a Rubik’s cube there are a dozen moves available, so if you’re hoping to make progress you need to have a very clear idea of what you’re doing and what you’ll do next. In contrast, The Brain only permits two moves at each stage along the way, so you can’t help making progress toward your goal as long as you don’t get confused and undo the move you just made. In that respect, solving The Brain is less like exploring a maze and more like walking through a labyrinth with a single non-branching path, such as one sees in a finite approximation to the infinitely twisty Hilbert curve. Thing-maker Santiago Ortiz has devised a Hilbert curve labyrinth that’s so easy even a marble can solve it. One could build a scaled-up version of this marble run that would accommodate a human rather than a marble, but I predict that anyone going down such a slide would feel quite ill by the time they reached the bottom.

**ESSAYS**

There were a number of good essays this month. Evelyn Lamb’s meta-parody I Can Has Numberz? was one of the oddest and (I have to say this) one of the cutest. First the Internet begat “Aren’t cats cute?” posts; then there were parodic “Aren’t humans cute?” posts; and now Lamb gives us an “Aren’t humans cute when they do math?” post as the first entry in a new genre that so far has but a single exemplar. Here’s hoping she does a sequel on the philosophy of mathematics. I mean, don’t you love it when a species that hasn’t encountered even *one* civilization from another planet pontificates about the timeless, universal features of its mathematics? It’s like when toddlers proudly announce what they’re going to be when they grow up. It’s *sooooo* adorable.

John Urschel wrote an op-ed that appeared in the New York Times, on Why Math Teachers Should Be More Like Football Coaches; it made me wonder if I’ve been sufficiently encouraging to students of mine who could have used encouragement.

Mark Dominus wrote an interesting piece called Math Jargon Failures about ways in which standard mathematical terminology makes things harder, rather than easier, for people who are trying to get a handle on the underlying concepts.

I liked many things about Robbert Dijkgraaf’s The Subtle Art of the Mathematical Conjecture. My favorite bit was this passage:

*In fact, the metaphor of scaling a summit does not adequately capture the full impact of a proof. Once the conjecture is proved, it is not so much the endpoint of an arduous journey but rather the starting point of an even greater adventure. A much more accurate image is that of a mountain pass, the saddle point that allows one to traverse from one valley into another.*

**NEWS**

Michael Griffin, Ken Ono, Larry Rolen and Don Zagier have given new life to an old idea about the Riemann Hypothesis; their work may or may not help our planet’s mathematical culture get an actual proof anytime soon, but it sheds new light on the Hypothesis. See the press release Mathematicians revive abandoned approach to Riemann Hypothesis, Enrico Bombieri’s summary of the paper, or the paper itself.

This year’s Abel Prize went to geometer Karen Uhlenbeck. Now that the Abel Prize and the Fields Medal (recently awarded to Maryam Mirzakhani) and the Salem Prize (awarded in 2006 to Stefanie Petermichl and more recently to Maryna Viazovska) have gone to women, it doesn’t seem unreasonable to hope that I’ll live to see the *last* “first woman to win …” mathematical news story! (I’m looking at you, Wolf Foundation.)

Speaking of Viazovska and her groundbreaking work in 2016 (applying modular forms to the study of higher-dimensional sphere-packing), there’s been a fabulous follow-up result, wonderfully described by Erica Klarreich. For those just now learning about this saga-in-progress, here’s a synopsis of the action so far: Cohn and Elkies showed in 2001 that if a certain “magic” function existed, it would have implications for sphere-packing in 8- and 24-dimensional space; Viazovska showed in 2016 that the function existed; and now Cohn and Viazovska, joined by Kumar, Miller, and Radchenko, have extended those results to other sorts of geometric problems about sticking together things that can’t (or “don’t want to”) get too close to each other. ^{2}

The 2001 article contained a near-miss construction of the required magic function, foreshadowing Viazovska’s eventual exact construction. But some near-misses can’t be fixed. For instance, the infamous “Horgan surface“, which some computer simulations seem to approximate, doesn’t actually exist, and Evelyn Lamb wrote an article a couple of years ago about polyhedra that don’t actually exist either, even though one can build paper models of them. (The catch is that the models aren’t — *can’t* be — mathematically perfect, but the imperfections are hard to spot with the naked eye.) Lamb’s article centered on Craig Kaplan, who has developed a fondness for these impossible shapes and their nearly-accurate real-world models. Now a cabal of chemists, with Kaplan’s help, have actually built such near-miss objects at molecular scale.

For those more interested in stacking 3-dimensional spheres than 8- or 24-dimensional ones, there’s something new to report in that department as well. If you want to make a stable stack of tennis balls, you can make ridiculously big ones by getting gravity and friction to work to your advantage.

The crucial idea is that a tennis ball at the top of a stack can serve as a kind of keystone, frictionally locking the balls below it into their proper places. Andria Rogava, who has been exploring this world of novel structures, writes “I can find no mention of such structures online and am sure they would have interested Martin Gardner — that great fan of recreational science — were he still alive.” I agree! I’m glad to see that Rogava’s work got the attention of the Daily Mail. Rogava has created a Facebook page for people interested in knowing more about these structures.

Here’s some new news about some old news. The old news is Edward Lorenz’s work on the weather, and its implications for the role of chaos in physical systems; the phrase “butterfly effect” has even become a part of popular culture. (The original metaphor involved seagulls, not butterflies, flapping their wings, but I think “butterfly effect” is a better name, not only because of the internal alliteration, and because butterflies are smaller than seagulls, but also because the Lorenz attractor, one of the first rigorously analyzed examples of a chaotic system, has a phase plot resembling a butterfly.) What’s recently come to light is that two women played an important role in Lorenz’s early work on chaos theory, as Lorenz himself acknowledged in his pioneering article but other people did not. Find out about Ellen Fetter and Margaret Hamilton in Joshua Sokol’s article Hidden Heroines of Chaos.

Lastly, there’s been some work in social psychology that uses mathematics as a testing ground for theories about when people claim knowledge they don’t have. The researchers asked subjects to assess their state of knowledge about such bogus concepts as “proper numbers” and “declarative fractions”. Who do you think was more likely to claim they knew the meanings of these meaningless terms: boys or girls? kids from wealthy households or kids from poor households? (Spoiler alert: You probably won’t be surprised by the answers.)

**ODDS AND ENDS**

I learned about many of the above articles through Twitter. If you aren’t on Twitter and you want to see what’s available, I suggest the list at TrueSciPhi as a good place to start. Some people whose tweets I especially enjoyed in May were Robert Fathauer (@RobFathauerArt), Roice Nelson (@TilingBot), Vincent Pantaloni (@panlepan), Catriona Shearer (@Cshearer41), and Dave Whyte (@beesandbombs).

There have been some good new videos this month. Check out Chalk of Champions and Numberphile: Peaceable Queens. There are also two new Math Encounters videos, courtesy of the National Museum of Mathematics (where the videos were made) and the Simons Foundation (which paid for production costs): Doug McKenna’s Golden textures: the art of dissecting golden geometries (January 2019) and Robbert Dijkgraaaf’s Space, time and the fourth dimension (March 2019). There are lots of older Math Encounters videos worth checking out. Since the word “ergodic” was a spelling word in this year’s national spelling bee^{3}, I recommend Bryna Kra’s talk Patterns and disorder: how random can random be? (February 2014). There’s a short film about Karen Uhlenbeck, courtesy of the Abel Prize Institute. And lastly, if you missed the National Math Festival in May, you can still watch a Numberphile video showing some highlights from the festival.

There’s also some good math content in audio form on the internet (though most mathematical content creators understandably gravitate toward media that make it easier to incorporate visuals). There was nothing new from Samuel Hansen’s Relatively Prime podcast this month, but there was a new episode of Kevin Knudson and Evelyn Lamb’s My Favorite Theorem podcast, featuring Moon Duchin. You can also learn more about Duchin’s work from an interview with her that appeared in Science News.

Finally, there’s David Eppstein’s article Playing with model trains and calling it graph theory describing work he’s done with with Demaine, Hesterberg, Jain, Lubiw, Uehara, and Uno; the authors showed that certain sorts of routing problems (think Rush Hour, but with trains) are PSPACE-complete. Results like that are really encouraging for folks like me (and Martin Gardner) who like puzzles but aren’t actually that good at solving them, because what these theorems say to us is: “You’re not stupid; these puzzles are genuinely hard!”

Next time (June 17): Mathematical Flimflam.

**ACKNOWLEDGMENTS**

Thanks to Tom Duff, Sandi Gubin, Brian Hayes, Michael Joseph, Michael Kleber, Evelyn Lamb, Andy Latto, Joseph Malkevitch, Evan Romer, and Katie Steckles.

**ENDNOTES**

#1. Why 170? The number 170 belongs to the exclusive club of numbers whose binary expansion alternates between 1 and 0 (**170**_{ten} equals **10101010**_{two}). This set of positive integers occurs as entry A000975 in the Handbook of Integer Sequences (the most fantastically useful book for mathematicians in the galaxy, now available as an online resource).

#2. The history of the new results in sphere-packing is a little bit more complicated than was described above.

Viazovska found the magic function for 8-dimensional sphere-packing on her own; she, Cohn, Kumar, Miller and Radchenko found the (different) magic function for 24-dimensional space a week later. This solved the sphere-packing problem in dimension 8 and dimension 24.

Now the five have extended their earlier work to apply to problems involving mutually-repelling points in 8-dimensional space and 24-dimensional space. The connection between sphere-packing and mutually repelling points comes from looking at the centers of the spheres in a packing. If the spheres are of radius *r*, the centers can never be closer than distance 2*r* from one another. One could instead imagine a physical system in which points can be close but don’t “want” to be, in the sense that having points be close together requires expending energy to overcome a repulsive force, and the system tries to find equilibrium configurations that minimize total energy. This is the setting in which the new work takes place. Part of what makes the new result so amazing is its generality; you might expect that details about the force-law would play a role, but the new result is remarkably general.

#3. If you check out the painful-to-watch video of elite 8th grade speller Aanson Cook misspelling “ergodic” (starting at 0:30), you’ll notice that he went to some trouble to try to determine whether the third syllable had a “d” or a “t”. In the end he incorrectly guessed that the word was “ergotic” and that the announcer was doing that thing wherein native English speakers pronounce a “t” like a “d”. (As linguists say, “Phones aren’t the same as phonemes“.) Or maybe Mr. Cook isn’t good at decoding the high-frequency part of a vocal waveform that determines the difference between “d” and “t”.

**REFERENCES**

Enrico Bombieri, New progress on the zeta function: From old conjectures to a major breakthrough,

Henry Cohn, Abhinav Kumar, Stephen D. Miller, Danylo Radchenko, and Maryna Viazovska, Universal optimality of the *E*_{8} and Leech lattices and interpolation formulas (preprint), February 2019.

Martin Gardner, “The Binary Gray Code”, available as Chapter 2 in Gardner’s book *Knotted Doughnuts and Other Mathematical Entertainments*.

, , , and

Evelyn Lamb, The Impossible Mathematics of the Real World, Nautilus, June 2017.

Joseph Malkevitch, “Gray codes”, AMS Features column, September 2008.

]]>