Life seems to take sly pleasure in magnifying the effects of chance. A spot of bad luck in your love-life can leave you embittered and wary, making you less romantically attractive. If a delayed paycheck makes a check you wrote bounce, you’ll instantly get charged overdraft fees, plus penalties if your balance can’t cover the fees. On the up side, if you’re slightly taller than the other kids in your gym class, you may get more playing time on the basketball court and become the best player, even if by the end of the year your height has regressed to the mean. Life is full of things like this, but we often lose sight of that truth. We say “You win a few, you lose a few”, as if in the long run things are bound to even out. In life, it’s often “You win a few, so you’re likelier to win the next few; you lose a few, so you’re likelier to lose the next few.”
Our intuitions about luck are colored by our most popular metaphors for chance events: tossing a coin and rolling a die. Coins have no memory; no matter how many times a fair coin has landed heads, the outcome of the next toss is still a 50-50 proposition. Likewise for dice. Both metaphors steer us toward “you-win-some-you-lose-some” thinking. What’s missing from our collective stock of metaphors is a good metaphor for chance processes that display memory.
As it happens, there’s a simple mathematical model of a random process in which initial imbalances are magnified over time, and if it were better known outside the narrow precincts of math and statistics, it might incline people toward a richer view of the ways randomness can drive systems in the real world. The process was invented by Hungarian mathematician George Pólya, and is called the Pólya urn. It’s too simple to be a detailed model of any specific real-world situation, but it can give us a way to think about phenomena like racial inequality in the workplace.
Before we introduce Pólya’s urn process, let’s consider a different process, namely, the process of making babies. Leaving aside rare events and statistically small effects (identical twins, intersexed children, differences in miscarriage rates based on the sex of the fetus, etc.) we’ll assume that when a couple has a child, the child is either male or female, and the child’s sex is a 50-50 proposition, regardless of the sexes of the couple’s pre-existing children. Suppose a family has 2 boys and 1 girl. If the family goes on to have three more children, then on average half of the three extra children will be male and half will be female (that is, 1.5 new children of each gender on average), so that on average we expect the family to end up with 3.5 boys and 2.5 girls.
If your imagination balks at the idea of a six-child family (to say nothing of an “average” family with 3.5 boys!), imagine an urn containing two white balls and one black ball. We also have a limitless supply of extra white and black balls not in the urn, as well as a fair coin. We toss the coin three times; each time the coin comes up heads, we add a white ball to the urn, and each time the coin comes up tails, we add a black ball to the urn. We toss the coin three times, so that three new balls are put into the urn. On average we expect that the urn will end up with 3.5 white balls and 2.5 black balls.
Pólya suggested a variation on this mechanism. When it’s time to add a ball to the urn, don’t toss a coin; instead, shake up the urn, choose a ball from the urn at random, and let the color of the new ball be the color of the ball you chose. That is: pick a ball at random from the urn. If it’s white, put it back, along with a new white ball; if it’s black, put it back, along with a new black ball.
Before we think about what will happen if we do this operation three times, let’s be clear about what happens when we do it once, starting from an urn containing 2 white balls and 1 black ball (schematically, WWB). We have a two-out-of-three chance of drawing one of the two white balls, and in that case, our urn will end up with 2+1=3 white balls and 1 black ball (WWWB); we have a one-out-of-three chance of drawing the black ball, and in that case, our urn will end up with 2 white balls and 1+1=2 black balls (WWBB). See Figure 1.
On an intuitive level, we can already see how this might play out if the process is repeated. The more white balls the urn contains, the more likely it is that the new ball that’s added will be white, further pushing the composition of the urn in the white direction. Likewise, the more black balls the urn contains, the more likely it is that the new ball that’s added will be black, further pushing the composition of the urn in the black direction. Still, that sort of qualitative argument won’t tell us how many white balls we expect the urn to contain once we’ve done the operation three times.
Here’s a cute trick will help us. Imagine that the two white balls initially in the urn are marked with a 1 and a 2 and the black ball is marked with a 3. Whenever we put a new ball in the urn, we mark it with a number as well, namely, whatever number was on the ball that dictated its color. That is: when we draw a ball from the urn, we record what number it shows, and when we add to the urn a new ball of the same color as the one we drew (along with the one we drew), we write that number on it. So some of the new white balls get marked with a 1 while others are marked with a 2, and all new black balls are marked with a 3. The reason we introduce this odd complication into our scheme is that it brings symmetry into the situation. There may be an initial imbalance between black and white balls, but the three marks 1, 2, and 3 are perfectly balanced at the start of the experiment. What’s more, the rule for evolving the system can be described in such a way that color plays no role except at the end when, as an afterthought, we color the balls marked 1 or 2 white and color all the balls marked 3 black. Since the initial state was symmetrical between the three marks (1, 2, and 3), and since the evolution rule was symmetrical, the expected number of occurrences of each mark must be the same; that is, at the end of the process, when there are six balls in the urn, we expect on average to see two balls marked “1”, two marked “2”, and two marked “3” — which means that, on average, there will be four white balls and two black balls.
Let’s review what we’ve seen about the two urn models (“non-Pólya” and Pólya). In the non-Pólya model, starting with 2 white balls and 1 black ball, we ended up three time steps later with an urn that, on average, contains 3.5 white balls and 2.5 black balls. In the Pólya model, starting with 2 white balls and 1 black ball, we ended up three time steps later with an urn that, on average, contains 4 white balls and 2 black balls — a larger absolute disparity than we started with. If we continue to evolve both models, we find that on average, the non-Pólya model has an average-case behavior with 1 more white ball than black ball, so that the average color balance is close to 50-50, while in the Pólya model, the average-case behavior has 2/3 of the balls being white and only 1/3 being black, reflecting the fact that the initial state had 2 white balls and 1 black ball. If you take the same sort of reasoning we used above and apply it to longer versions of the experiment in which more than three new balls are added to the urn, you’ll see that the non-Pólya urn model preserves (on average) the absolute disparity between the numbers of white balls and black balls, while the Pólya urn model preserves (on average) the relative disparity.
But averages can deceive, as in the myth of the statistician who drowned in a stream whose average depth was 6 inches. If you go to the Success Equation Simulations website for the Pólya urn model, you’ll be able to play with the Pólya urn model and see for yourself why the average-case behavior of this model is not such a good guide to what the model actually does. (To ward off potential confusion, I’ll mention that the app calls the balls “marbles” and that they don’t come in white, but I’ll continue to talk about black and white balls.) If you start the urn with 2 white balls and 1 black ball, and you look at what happens when the number of balls in the urn becomes 100, you’ll see that much of the time, the final proportion of white balls in the urn isn’t very close to 2/3, even though (as is shown by the “1, 2, 3” marking trick described above) that proportion is 2/3 on average.
To make your discomfort more acute, I ask you to imagine starting the urn with 1 white ball and 1 black ball, with a statistician peering over your shoulder throughout the process, offering mathematically correct but potentially misleading predictions. At the start, the statistician will happily announce (based on the symmetry between white and black) that if you add ninety-eight more balls following the Pólya protocol, the expected number of white balls in the urn will be half of one hundred, or 50. But then the statistician will drastically change her answer after you add one more ball to the urn, though how she changes it depends on what color ball you add to the urn. If you add a white ball, the statistician says that, based on the new information at her disposal (the color of the new ball), her updated estimate of the number of white balls in the urn at the end of the experiment is not 50 but rather 2/3 of 100, i.e. around 67; but if you add a black ball, she says that her estimate of the number of white balls in the urn at the end of the experiment is not 50 but rather 1/3 of 100, i.e., around 33.
The eavesdropping statistician is going to make a big change in her initial estimate, one way or another. Is she being flaky? Far from it. If anything is flaky, it’s the Pólya urn model itself. The long-term behavior of the model is acutely sensitive to the initial state of the system, as well as to the near-initial conditions that prevail a few time-steps later, so that as new information comes in, a smart statistician will make drastic changes in her long-term predictions about the system’s behavior. Contrast this with what happens when a statistician is observing us simulating the non-Pólya urn, for which the colors of new balls are determined by a coin toss; her revised estimates of how many of the 100 balls will be white at the end of the experiment will change rather slowly as new information comes in.
The fact that the Pólya urn’s behavior is exquisitely sensitive to what happens on the first draw, and only slightly less sensitive to what happens on the second draw, and still quite sensitive to what happens on the third draw, and so on, results in a kind of instability for the long-term behavior of the system: the system has an average-case behavior (as it must), but the system doesn’t hew very closely to its average-case behavior. That’s why, when you run the 100-balls experiment repeatedly, the composition of the urn at the end of each experiment can vary markedly from one experiment to the next. (Again, contrast this with what happens when you’re using the non-Pólya urn: most of the time, at the end of each 100-balls experiment, there’ll be close to 50 white balls and 50 black balls in the urn.)
Let’s bring this point home with some pictures. Figure 2 shows what typically happens if you run the non-Pólya 100-balls experiment lots of times (I ran it a thousand times in computer simulation), starting from the 1-white-and-1-black state, and plot the number of white balls in the urn at the end of the experiment via a histogram (bar graph). The distribution shows a sharp peak near the number 50, corresponding to the fact that in most of the experiments, the number of white balls in the urn at the end of the experiment was close to 50. Compare that with Figure 3, which shows what typically happens if you run the Pólya 100-balls experiment lots of times (again, I did a thousand runs). Now the histogram shows no clear peak at all. Leaving aside statistical fluctuations, the distribution seems to be flat. That is, the number of black balls in the urn at the end of the experiment, which could be anything between 1 and 99, is about equally likely to take on each of its ninety-nine possible values.
In fact, this flatness property holds in a very sharp way. Each of these 99 values has exactly the same chance of cropping up as the number of white balls at the end of a 100-ball Pólya urn experiment. This isn’t a special property of the numbers 99 and 100, either; it applies to Pólya urn experiments of any fixed length.
WHY IS IT FLAT?
There’s a nice way to see that the Pólya urn model, started from 1-white-and-1-black state, satisfies this exact flatness law. I’ll demonstrate the reasoning process for the case where we stop the experiment after the urn has five balls in it, but the argument works much more generally.
Figure 4 schematically shows the first four steps of the process of adding new balls to Pólya’s urn, starting from an urn containing one ball of each color.
Each circle (which I’ll call a “node”) corresponds to a different state of the urn, with w white balls and b black balls for some positive integers w and b. From such a state, there are w ways to choose a white ball (resulting in an extra white ball getting added to the urn) and b ways to choose a black ball (resulting in an extra black ball getting added to the urn). The arrows represent these transitions. So, instead of imagining an urn that keeps getting balls added to it, you can imagine a token that keeps getting slid down the diagram. Instead of having an urn that starts with 1 white ball and 1 black ball, we imagine a token that starts at the WB node (w = 1, b = 1). Then we make a random choice of one of the two arrows leading downward from that node and slide the token in the direction of that arrow. With probability 1/2 we move the token down and to the left (to the WWB node) and with probability 1/2 we move the token down and to the right (to the WBB node). Either way, the token ends up at a new node corresponding to a state of the urn with three balls. Then we make a random choice of one of the three arrows leading downward from that node and slide the token in the direction of that arrow. If the token is at the WWB node, then with probability 2/3 it goes down and to the left and with probability 1/3 it goes down and to the right; but if the token is at the WBB node, then with probability 1/3 it goes down and to the left and with probability 2/3 it goes down and to the right. Either way, the token ends up at a node corresponding to a state of the urn with four balls. Finally, we make a random choice of one of the four arrows leading downward from that node, and move the token to a node at the five-balls level of the diagram.
The total number of arrow-paths the token might follow is 2 x 3 x 4, since there are 2 arrows leading from the top node to nodes in the 3-balls level, and then 3 arrows leading from that node to nodes in the 4-balls level, and then 4 arrows leading from that node to nodes in the 5-balls level. Moreover, each of these 2 x 3 x 4 paths has probability (1/2) x (1/3) x (1/4) of being the arrow-path that the token actually follows. To compute the probability that the token ends up at a particular node on the 5-balls level, it’s enough to count how many paths lead to that node; the probability of the token ending up at that node will just be (1/2) x (1/3) x (1/4) times the number of such paths. But by working backwards from the bottom to the top, we can easily count such paths: no matter which of the nodes in the bottom (5-balls) level we end at, there are 3 arrows leading to it from the 4-balls level, and then 2 arrows leading to that node from the 3-balls level, and then 1 arrow leading to that node from the top node. So the number of paths from the top node to any particular node in the 5-balls level is 3 x 2 x 1, and the probability that the token winds up at that node is (1/2) x (1/3) x (1/4) x (3 x 2 x 1) = 1/4.
RANDOM PROCESSES, WITH THE RANDOMNESS REMOVED
Another way you could see flatness property at the 5-balls level is by doing 12 experiments simultaneously and cheating in a curious way that violates the assumption of randomness and yet, for deep reasons, gives the right answer. See my video demo (or follow the description in the rest of this paragraph). Start with a stack of 12 poker chips at the WB node. Since on average we expect that in half of those experiments the token will move from WB to WWB and in the other half the token will move from WB to WBB, let’s split those 12 chips evenly between the WWB and WBB nodes. If we were doing an honest simulation of the Pólya process, those 12 chips might split 7-5 or 4-8 or even (once in a very blue moon) 12-0. But we’re going to split them 6-6, because that’s what happens on average. Now we’ve got 6 chips at the WWB node and 6 chips at the WBB node. There are 3 arrows leading downward from the WWB node, so of the 6 chips at the WWB node, we can expect on average that 1/3 x 6 = 2 would travel along each arrow: 2+2=4 would go to the WWWB node and 2 would go to the WWBB node. So, just move those 6 chips in that “average-case fashion”, sending 4 to WWWB and 2 to WWBB. Do the same with the 6 chips at the WBB node: send 2 of the 6 along each of the three arrows leading away from the WBB node, sending 2 to WWBB and 4 to WBBB. Now each of the three nodes in the 4-balls level has 4 chips and 4 outgoing arrows, so divide those chips equitably among the outgoing arrows: in terms of going left versus going right, the 4 chips at WWWB split 3-1, the 4 chips at WWBB split 2-2, and the 4 chips at WBBB split 1-3. When we do the tally at the 5-balls level, we find that each node gets 3 chips.
This looks like cheating because it is cheating; we’re not advancing the tokens randomly, but rather “quasi-randomly”. We’re aping the average-case behavior of the Pólya model, but smoothing out the statistical fluctuations. The reason I’m showing you this bogus way to quasi-simulate random processes is that there’s a mathematical principle guaranteeing that, properly interpreted, simulations of the quasirandom version of the Pólya model can teach us things about the random Pólya model. Specifically, the fact that each of the four nodes at the 5-balls level ended up with the same number of chips assures us that if we run the random Pólya model for three steps starting from the node WB, the chance of ending up at any particular node in the 5-balls level is 1/4.
It should be stressed that these nodes are not urns; they represent different possibilities for the color balance within an urn. The chips that we put in the nodes have nothing to do with the balls (or marbles) we put in an urn; instead, when we put 6 chips in a particular node (say that WWB node) this represents the idea of there being 6 experiments that pass through the 2-white-balls-and-1-black-ball state. But really, we don’t need to have an interpretation for the chips. They are just parts of a calculating engine that, at the end of the day, gives us a way to compute quantities that interest us.
This way of derandomizing the Pólya process is a special case of a much more general derandomization idea invented by German mathematician and educator Arthur Engel. It’s commonly called chip-firing, although Engel himself called his invention the probabilistic abacus. He came up with it in conjunction with his efforts to teach the rudiments of probability theory to 4th graders. In Engel’s scheme, we don’t actually start with 12 chips at the WB node; we start with 0. Then we add a chip. Then we add another, so that there are 2 chips at the WB node. Then we “fire” the WB node, sending 1 chip to WWB and the other to WBB. Then other things happen, as shown in my video demo. Here’s what’s going on: anytime a node at the n-balls level (with n = 2, 3, or 4) has n or more chips, we “fire” that node, sending 1 chip along each of the n outgoing arrows from that node. (We won’t fire nodes at the 5-balls level; we call these “absorbing nodes”, and we just let chips accumulate there.) Sometimes firing one node pushes so many chips to an adjacent node that the new node becomes capable of firing. We fire all the nodes we can until no node can be fired at level 2, 3, or 4. (Note: The top level in the diagram is level 2; the bottom level of the diagram is level 5.) Then we add another chip at the WB node, and repeat the procedure. We keep doing this until no chips remain at level 2, 3, or 4. When we’ve done this, the relative preponderance of chips at level 5 tells us exactly what the probability is that the random Pólya urn model would lead to that state. The video shows that when we do this, we end up with 12 chips at level 5, divided as 3+3+3+3.
Once you’ve got the hang of chip-firing, you can apply it to other situations. For instance, let’s go back to an earlier question about starting the Pólya process with an urn containing 2 white balls and 1 black ball. If you start a token at the WWB node and apply the Pólya protocol three times, what are the respective probabilities of arriving at WWWWWB, WWWWBB, WWWBBB, and WWBBBB? You’ll need to extend the diagram of Figure 4 so that it includes a 6-balls level. Start adding chips at the WWB node, using Engel’s rule to fire nodes when possible, and letting chips collect at the 6-balls level. Keep doing it until all the chips you’ve added have found their way to the 6-balls level. If you do it right, you’ll notice a nice pattern governing the heights of the stacks of chips.
Have fun! And if you enjoy the game Engel invented, send your words of appreciation to me and I’ll relay them to him.
For a different explanation of the flatness property of the Pólya urn process, see the probability chapter of “Mathematical Puzzles: A Connoisseur’s Collection” by Peter Winkler.
This article was written with help from Peter Doyle, Sandi Gubin, Henri Piciotto, Saul Propp, and James Tanton. I got the idea for the “1,2,3” trick from the Stack Exchange post on “A problem on Pólya’s urn scheme“, and in particular, the reply by Andreas Blass. The videos were created with the help of Mitch Shuldman of UMass Lowell Media Services.
Lynne L. Doty, K. Peter Krog, and Tracey Baldwin McGrail, Probability and chip firing games, published by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), 2004.
Arthur Engel, The probabilistic abacus, Educational Studies in Mathematics, Vol. 6, No. 1 (March 1975), pp. 1–22.
Arthur Engel, Why does the probabilistic abacus work?, Educational Studies in Mathematics, Vol. 7, No. 1 (July 1976), pp. 59–69.
J. Laurie Snell, The Engel algorithm for absorbing Markov chains, circa 1979.
Peter Winkler, Mathematical Puzzles: A Connoisseur’s Collection. A.K. Peters, 2003.
When we start from WWB, the histogram is not flat but it is linear. If we stop the process when it reaches the 6-balls level, we find that the respective probabilities of WWWWWB, WWWWBB, WWWBBB, WWBBBB, and WBBBBB are 4/10, 3/10, 2/10, 1/10, and 0/10.
You may also wish to apply chip-firing to the “non-Pólya urn”. Modify Figure 4 so that each node has just two outgoing edges, one going down-and-left and one going down-and-right, corresponding to the fact that the new ball is equally likely to be white or black.
To speed up the chip-firing process, you can use an extra move that allows you to take the current configuration of chips and multiply it by any positive integer you like. That way, you can ensure that at every node with a positive number of chips, the number of chips is a multiple of the number of arrows leading downward from the node. This will enable you to clear out all those nodes without there being any chips left over.
Where chip-firing becomes more useful (but less intuitive) is the analysis of random processes that involve looping, so that a state that has been visited before can be visited again (as in the case of an urn model that allows you to both add balls to the urn and remove balls from it). In that case, you may never be able to push all the chips into the absorbing nodes (the ones that chips are allowed to accumulate in), and it might appear that the chip-firing game will never end and can never return answers to probabilistic questions. In a future essay I’ll show you Engel’s clever way of adapting the chip-firing game to handle situations like this.