# My Favorite Theorem

2019 is a great year to learn some calculus. Not only are there the videos of Robert Ghrist and Grant Sanderson, but there’s a wonderful new book out by Steven Strogatz. Strogatz has spent the last thirty years growing into the kind of writer who could produce the book about calculus that the world needs, and now he’s produced it. In a few months Ben Orlin will be coming out with a book of his own, and the chapters I’ve seen make me wish I had the power to magically forget calculus (temporarily), so I could have the experience of encountering the subject for the first time through Orlin’s delightful combination of lively prose and cutely inept drawings. And as if that weren’t enough, this year we also have David Bressoud‘s clarion call for teachers to improve the pedagogy of calculus by putting its standard topics back into something like the order in which they were discovered. Calculus is having a gala year.

The celebration is long overdue.1 Calculus is one of the triumphs of the human spirit, and a demonstration of what perfect straight things (and perfect curvy things) can be made from the crooked timber of humanity. It’s given us a way of seeing order amidst the variety and confusion of reality, hand-in-hand with the expectation that when things happen, they happen for a reason, and that when surprising things happen, it’s time to look for new forces or additional variables.

One of my favorite theorems is a calculus theorem, but it’s not a theorem anyone talks about very much. It may seem mundane (if you’re mathematically sophisticated) or silly (if you’re not). It’s seldom stated, and when it is stated, it’s a lowly lemma, a plank we walk across on the way to our true destination. But it’s a crucial property that holds the real number line together and makes calculus predictive of what happens in the world (as long as we stay away from chaotic and/or quantum mechanical systems). It’s called the Constant Value Theorem, and it can be stated as a succinct motto: “Whatever doesn’t change is constant.” (This is not to be confused with the motto “Change is the only constant”, which happens to be the title of Orlin’s book.) I’ll tell you four things about this theorem that I find surprising and beautiful.

I spoke about the Constant Value Theorem earlier this month on the My Favorite Theorem podcast, so you might want to listen to that episode before or after you read this essay. In a playful vein, I suggest that if the Constant Value Theorem weren’t true, we’d live in a scarily unpredictable universe in which objects could jump around or change course willy-nilly. But it would be more accurate to say that without the Constant Value Theorem, calculus wouldn’t have the predictive power that’s made it such a successful tool in helping us model and master our world.

THE CELESTIAL ORRERY

Let’s start the story with mathematician Pierre-Simon Laplace. Laplace had a vision of the universe as a giant orrery, running on invisible tracks laid out by Newton’s laws. He wrote:

Given for one instant an intelligence which could comprehend all the forces by which nature is animated and the respective positions of the beings which compose it, if moreover this intelligence were vast enough to submit these data to analysis, it would embrace in the same formula both the movements of the largest bodies in the universe and those of the lightest atom; to it nothing would be uncertain, and the future as the past would be present to its eye.

Newton’s laws, as understood by Laplace, are expressed as equations describing the way various quantities change over time. Let f(t) represent some time-dependent quantity, where t represents time; then f’(t) (called the derivative of f(t) with respect to time) is defined to be the rate at which f(t) is changing at time t. For instance, if f(t) represents the position of some object at time t, then f(t) represents the rate at which the position is changing at time t, better known as the velocity of the object at time t, and f”(t) (the derivative of the derivative of f(t), usually called the second derivative of f(t)) represents the acceleration of the object at time t. Differential equations are equations that constrain f(t) by providing information about f(t) (and sometimes f”(t) and higher derivatives too), and what Laplace meant by subjecting data to analysis is solving differential equations. Typically a differential equation, considered in isolation, has an infinity of solutions, all representing possible ways a system could evolve, but by specifying the initial state of the system, we can eliminate all but one of those solutions. That’s what physicists do for simple subsystems of the universe, and what Laplace’s hypothetical intelligence can do for the universe as a whole.

Among differential equations, none is simpler than f’(t) = 0, expressing the relation that as the quantity t changes, the quantity f(t) doesn’t change at all; a basic step in solving many problems in math and physics is passing from the assertion that f’(t) = 0 for all t to the conclusion that there exists some c such that f(t) = c for all t. The Constant Value Theorem is what gives us the right to draw this conclusion.

The Constant Value Theorem: If the function f doesn’t change (that is, if f(t) = 0 for all real numbers t), then f is constant (that is, there exists c such that f(t) = c for all real numbers t).

Usually the Theorem is applied in interesting ways, for instance in talking about the conservation of the total energy in a pendulum, expressed as the sum of the (separately non-conserved) kinetic energy and potential energy. But the Theorem also makes sense in less physically interesting systems. For instance, if f(t) signifies the position of a particle at time t, so that f(t) signifies the velocity of the particle at time t, then the logical leap from “f’(t) = 0 for all t” to “There exists some c such that f(t) = c for all t” amounts to the assertion that if an object has velocity 0 for all time, then it has some particular location for all time.

I’m sure you agree that there’d be something very wrong with physics, or at least with the concept of velocity, if an object could have velocity 0 for all time but fail to have constant position. Part of what we mean (or intend to mean) by saying that something has velocity 0 for all time is that it stays put! And that’s what the Constant Value Theorem tells us.

THREE SURPRISES

The mathematical proof of the Constant Value Theorem is surprisingly subtle. When you start to develop calculus rigorously from first principles, using precise definitions and ironclad logic, it turns out to be impossible to derive the Constant Value Theorem from the definition of the derivative using only the apparatus of high-school algebra. It’s very easy to prove the converse, namely, that if a function is constant then its derivative is zero. But in order to prove the Constant Value Theorem itself, we need to invoke what’s called the completeness property of the real number system, which informally speaking asserts that the number line has no gaps in it. This property didn’t emerge until over a century after Newton and Leibniz invented calculus. The fact that such a simple assertion is true but not easy to prove is, for me, the first surprising thing about the Constant Value Theorem.

The second surprising thing about the Constant Value Theorem is that it isn’t just a prototype for the kinds of theorems Laplace’s intelligence would depend on; it actually provides proofs. For instance, consider the theorems “If f”(t) = 0 for all t, then f(t) is a linear function” and “If f’(t) = f(t) for all t, then f(t) is an exponential function”. Both of them have a family resemblance to “If f’(t) = 0, then f(t) is a constant function”, so you might guess (correctly) that they can both be proved by mimicking the proof of the Constant Value Theorem, that is, by appealing to the completeness property of the reals. But what’s not so obvious is that you can prove these more complicated results without appealing directly to the completeness property of the reals at all, just by invoking the Constant Value Theorem itself in clever ways! (Endnotes #2 and #3 give examples, for those of you who know calculus.) The Constant Value Theorem is what makes Laplace’s vision viable. You might say that if Newton’s laws are the rails on which Newton’s universe runs, then the Constant Value Theorem is what keeps the universe from jumping the rails.

The third surprising thing about the Constant Value Theorem is that it isn’t just a consequence of the completeness property of the reals; it implies the completeness property of the reals. We normally think of the flow of implication in mathematics as being unidirectional: from the axioms we deduce lemmas, from the lemmas we deduce theorems, and from the theorems we deduce corollaries. I like to think of the implicational structure of calculus as something like a tree, and to envision the Constant Value Theorem as a piece of fruit hanging from a twig hanging from a branch hanging from a bigger branch growing out of the trunk of a tree that grew out of a seed containing the completeness axiom. Just as the fruit of a tree contains the DNA of the seed from which the tree grew, this theorem bears within itself the axiom from which it arose.4  I find that a satisfyingly organic state of affairs.

THE FOURTH SURPRISE

So far I’ve been talking about differential calculus, and have even been calling it, simply, “calculus”, as if there were no other kind. But… There’s another flavor of calculus that’s not the kind you meet in the classroom in your school. It’s the other flavor of calculus; it’s sometimes called “discrete”, and the person who devised it was a fellow named George Boole. This is the kind of calculus we use when we’re talking about processes that evolve in discrete time rather than continuous time. Examples of such processes that you’ve probably seen before include the doubling sequence 1, 2, 4, 8, 16, … and the Fibonacci sequence 1, 2, 3, 5, 8, … The first sequence, viewed as a function of time, satisfies f(t+1) = 2 f(t), while the second satisfies f(t+2) = f(t+1) + f(t).5

The theory governing such equations, called discrete calculus or the difference calculus or the calculus of finite differences, is remarkably parallel to the theory governing differential equations. (If you’ve never seen it before, one place to learn the basics is Christopher Catone’s recent article, listed in the References.) In both setups, we find that the equations we want to solve, taken in isolation, typically have infinitely many solutions; for instance, any sequence of the form c, 2c, 4c, 8c, 16c, … is a solution to the equation f(t+1) = 2 f(t). To single out the solution we’re interested in, we have to make use of initial conditions (in this case, the initial condition f(0) = 1). In both the continuous and discrete settings, special equations called characteristic equations guide us toward the solution, and things are more complicated when the characteristic equation has repeated roots. The parallels go on and on.

I teach a course on discrete mathematics for computer scientists, and I train my students to solve discrete recurrence equations like the two mentioned above because the computer science department thinks the skill is important; but I feel a bit funny doing this, because nowadays computers can solve a broad class of problems like this on their own, and they’re better at it than people are, just as modern computers are better at solving differential equations than people are. (Did I mention that the George Boole who came up with discrete calculus is the same George Boole who came up with the Boolean logic that underlies the digital computer?)

But here’s where the main subject of this essay makes a dramatic return to the stage: when computers solve problems in discrete calculus, they often make implicit or explicit use of the principle that says that if a sequence of numbers isn’t changing from one term to the next, then that sequence must be constant. That is, if f is some function of discrete time satisfying f(1) − f(0) = 0 and f(2) − f(1) = 0 and f(3) − f(2) = 0 and so on, then there must be some constant c so that f(t) = c for all whole numbers t. This is precisely the discrete analogue of the Constant Value Theorem!6

The Discrete Constant Value Theorem: If the sequence f doesn’t change (that is, if f(t+1) − f(t) = 0 for all whole numbers t), then f is constant (that is, there exists c such that f(t) = c for all whole numbers t).

What I find surprising (and deeply satisfying) here is that even though continuous mathematics and discrete mathematics are often taught as totally different subjects, there are deep affinities, such as the way the two different Constant Value Theorems can be seen as playing such deep (albeit hidden) roles in both subjects. So it’s not just that calculus is one tree and discrete mathematics is another tree, each bearing fruit containing the DNA of the seed from which that fruit came; these two trees somehow converse with one another, as part of a beautiful unified ecosystem. It’s fun to listen in on what they say.

Thanks to William Gasarch, Sandi Gubin, Kevin Knudson and Evelyn Lamb.

Next month: Calculus is Deeply Irrational.

ENDNOTES

#1. If there’s Math Awareness Month and Pi Day, why not Calculus Week, or at least Calculus Weekend?

#2: The Constant Value Theorem says that if f’, the derivative of the function f,  has the property f’(t) = 0 for all t, then f(t) is constant. But what can we conclude about f(t) if what we know is that f”(t) = 0 for all t?

The Constant Value Theorem, applied to f’, tells us that if f”(t) = 0 for all t, then f’(t) must be constant. Let’s give that constant the name c1, so that f’(t) = c1 for all t, and let’s take g(t) = f(t) − c1 t. (Why? Because I’ve seen this trick before!) We can apply the Constant Value Theorem again, this time to g(t). The rules of differentiation give us g‘(t) = f(t) − c1, but we know that f’(t) = c1 for all t, so f’(t) − c1 = 0 for all t, so g’(t) = 0 for all t. This implies (by the Constant Value Theorem) that g(t) must be constant. Let’s give that constant the name c0. Then we have f(t) − c1 t = g(t) = c0 for all t, so f(t) = c1 t + c0 for all t. Now you see why I chose to call that second constant c0 rather than c2; c1 is the coefficient of t1 and c0 is the coefficient of t0 in the linear polynomial c1 t + c0.

Analogously, applying the Constant Value Theorem three times tells us that if f”’(t) = 0 for all t, then f(t) can be expressed in the form c2 t2 + c1 t + c0. This has implications for Newtonian ballistics. If a projectile is traveling through a uniform vertical gravitational field, it’s undergoing constant downward force, and by Newton’s law F = ma relating force to acceleration, the projectile is undergoing constant downward acceleration. Since the derivative of a constant is 0 (that’s the easy converse of the Constant Value Theorem), the time-derivative of the downward acceleration must be 0, so (because of what I said in the first sentence of this paragraph), the vertical coordinate of the projectile must be given by a quadratic function of time. Meanwhile, the horizontal coordinate of the projectile must be given by a linear function of time, since its second derivative is zero (on account of the fact that there’s no force acting in the horizontal direction). This explains why the motion of the projectile follows a parabola. And it all follows from the Constant Value Theorem.

#3: One of the simplest differential equations that takes us out of the realm of polynomial functions is the differential equation f’(t) = k f(t); that is, the function f(t) is proportional to its own derivative. We teach students that all such functions are of the form cekt for some c, and they apply this to problems involving exponential growth or decay. But why are these functions the only solutions to this differential equation? As in Endnote #2, the Constant Value Theorem, applied with a certain amount of cleverness, provides the proof. Consider the auxiliary function g(t) = f(t) ekt. The rules for taking derivatives tell us that g’(t) = (f’(t)) (ekt) + (f(t)) (−k ekt) = (f’(t) − k f(t)) ekt, which equals 0 for all t if f’(t) = k f(t) for all t. The Constant Value Theorem now tells us that g(t) must be a constant; call that constant c. Then the equation f(t) ekt = c, rearranged, gives f(t) = c ekt, as claimed.

#4: The enterprise of reversing the implicational flow of ordinary mathematics, and deriving axioms from theorems instead of the other way round, is called reverse mathematics. More precisely, in reverse mathematics we typically focus on one axiom in some axiomatic system, and show that, if all the other axioms are taken as true, then the singled-out axiom is logically equivalent to some theorem far out in the crown of the tree of implications. One of my favorite examples comes from Euclidean geometry. If one assumes that the first four of Euclid’s five axioms are true, but reserves judgment about the fifth (the most problematic of the five, known as the parallel postulate), then one can show that the parallel postulate is equivalent to the proposition that “A square exists”, by which I mean the proposition that there exists a quadrilateral with four equal sides and four right angles. It’s not surprising that in the presence of the first four axioms, the fifth axiom enables you to prove the existence of a square; what I find delightful is that in the presence of the first four axioms, the existence of a square (just one, anywhere in the plane!) allows you to deduce the parallel postulate.

During a period in my career when I taught Honors Calculus semester after semester, I got interested in figuring out which of the theorems I was teaching my students were equivalent to the completeness axiom (in the presence of all the other axioms governing the real numbers), and I wrote an article about it. I should say, by way of scholarly precision, that the kind of reverse mathematics I was practicing in this article is a much blunter affair than what most people call the reverse mathematics of the real numbers; I skirted over a whole lot of niceties that logicians rightly care about.

#5. In most textbooks on discrete mathematics one writes the equation f(t+2) = f(t+1) + f(t) as fn+2 = fn+1 + fn , where the use of n rather than t highlights the discrete nature of the independent variable, and where the use of subscripts rather than arguments in parentheses accords with historical precedent; but this is mere notation, and it can distract us from the underlying similarity between the two contexts.

#6. There’s a great pedagogical opportunity in the teaching of integrals that, as far as I know, no calculus textbook author has ever exploited. When the integral is introduced via Riemann sums, we often show students formulas like 12 + 22 + 32 + … + n2 = n(n+1)(2n+1)/6 that come into the story. This is exactly the sort of problem that the difference calculus was designed for. Yet instead of showing the students how the difference calculus gives solutions to these problems, we show them a method of proof called mathematical induction. Don’t get me wrong, I love induction, and I wouldn’t dream of teaching a discrete mathematics course without covering it. But it’s foreign to the spirit and methods of calculus, whereas the discrete Constant Value Theorem is the discrete counterpart of something students get to see when they learn about derivatives. This lost opportunity for reinforcing the conceptual unity of calculus has at times led me to half-seriously call for banning the practice of teaching proof by induction in calculus courses, and I’ve even given some talks on that theme. Just as the continuous Constant Value Theorem can serve as a replacement for other completeness properties of the reals, the discrete Constant Value Theorem can serve as a replacement for the axiom of induction. And the principle “What doesn’t change is constant” is part of the way computers “think” about such problems. So maybe we should teach more of our students to think that way too.

REFERENCES

David Bressoud, Calculus Reordered: A History of the Big Ideas.

Christopher Catone, Bringing calculus into discrete math via the discrete derivative, College Mathematics Journal, January 2019 (Volume 50, issue 1).

James Propp, Real Analysis in Reverse. Published in The American Mathematical Monthly, Vol. 120, No. 5 (May 2013), pp. 392-408.

James Propp, Bridging the gap between the continuous and the discrete (slides for a talk given in 2013).

James Propp, Don’t teach mathematical induction! (slides for a talk given in 2014). There was supposed to be a video of my talk, but unfortunately the movie-file appears to be audio-only.

James Propp, Calculus is Deeply Irrational (an essay I submitted to the 2019 Big Internet Math-Off)

Steven Strogatz, Infinite Powers: How Calculus Reveals the Secrets of the Universe.

## 16 thoughts on “My Favorite Theorem”

1. Pingback: Mathematical Flimflam |

2. fc

Hello, professor! I enjoyed this article very much, and I find the lines you draw between discrete mathematics, trees, and orreries entrancing! I just graduated high school, having taken the IB with the calculus option in my maths class and I wish I had known this before…On another note, when you talk about the CVT keeping the universe from going off the rails, I found that comforting. Not to generalise, but especially in the world of physics I feel as if everything keeps getting stranger and stranger (Heisenberg’s uncertainty principle, for one, puzzled me quite a bit) so it’s nice to know mathematics, albeit with its own share of oddities, contains these roots, like a universal language that can help us understand the blueprints of everything around us. I am looking forward to reading your next article.

Liked by 1 person

1. jamespropp Post author

Glad you liked it! In a way it’s too bad we live in Heisenberg’s universe rather than Newton’s; pushing forward with the metaphor, we might say that the wheels and the rails are probability clouds, and that jumping the rails isn’t really impossible — just exponentially unlikely.

Like

1. fc

I had to google probability clouds, and when I did an electron cloud model popped up, so that was a very cool metaphor! Worked like a matryoshka (or the Mandelbrot set when you zoom in, if we were to use a maths metaphor). Reading articles like this, I’m so excited to learn more, college starts in two days (Yale-NUS, we’d be so honored if you visited sometime!) so there’s plenty of time then.

Like

3. Paolo

Really interesting post! I have one question. I think it’s easy to prove the Discrete Constant Value Theoreom by induction, but how can I prove the Principle of Mathematical Induction using the Discrete Constant Value Theorem? I tried to do that, by defining a sequence whose n-th term is 1 if the n-th proposition is true and 0 if the n-th proposition is false. But I can’t show that this sequence is unchanging.

Liked by 1 person

1. jamespropp Post author

Great question! Apply the Constant Value Theorem to the sequence whose nth term is the truth value of the CONJUNCTION of the first n propositions.

Liked by 1 person

1. Paolo

Thank you!

Like

4. Pingback: Calculus is Deeply Irrational |

5. mdiamond123

The Constant Value Theorem seems focused on a specific derivative across all values of a function. But what about all derivatives for a specific value? It seems to me that a continuous function containing a point with derivatives of every order equal to 0 must be a constant function. Or, put another way: there can be no point in a continuous non-constant function with all derivatives of every order equal to 0.

Is there a name for this idea? Because I find it to be poetic: that change cannot arise out of complete stasis, but must be present eternally to be present at all.

Like

1. jamespropp Post author

One of my favorite counterexamples from real analysis is that if we define f(x) = -1/x^2 for all nonzero x, with f(0) = 0, then f is infinitely differentiable everywhere, and its derivatives of all orders take the value 0 at x=0, even though f is nonconstant!

Like

6. mdiamond123

Good point!

Is there a term for a stronger sense of continuity that disallows that kind of removable singularity? If so, then my point would apply to that class of functions, I think.

Like

1. mdiamond123

Or it might be an essential singularity? Sorry if I’m butchering the terminology.

Perhaps the theorem I suggested would only apply to polynomials… more limited, but still something.

Like

1. jamespropp Post author

Your idea applies to polynomials, and in complex analysis it applies to entire functions.

Like