We’re all shit at probability

A four-part guide in mind-bending mathematics

In the weeks following the September 11 terrorist attacks, 1.4 million Americans cancelled their holiday flights — many opting to instead drive to their destinations. Over these same weeks a thousand more people died on US roads than the previous year. We’re all aware that driving is much more dangerous than flying. But in the fear and panic that followed the falling of the towers, our ability to assess the probability of risk proved defective, resulting in a thousand extra deaths.

Probability — the likeliness that an event will occur — pervades almost every aspect of our lives. It’s a fundamental part of science: when we develop a new drug, we test it to determine the likelihood of it curing your syphilis with the likelihood of it turning the disease into some kind of super-syphilis. If it’s much more likely to be a cure, it will probably be put on shelves. When an insurance company asks whether you’re a 17-year-old lad with a 1991 VN Commodore or an upper-middle-class professional with a Prius, they’re calculating the probability that you’re going to do doughies and write off your car.

Despite it being indispensable, we’re generally really bad at understanding probability. Pop scientist Stephen Jay Gould says that “misunderstanding of probability may be the greatest of all general impediments to scientific literacy”.

So on a rainy Saturday I decided to do some serious Googling to try understand why we’re so bad at it. It turns out the answer is really complicated, resting somewhere between the inherent biases and limitations of our cognitive capacities and the contested philosophical foundations of probability itself.

Psychologists have shown our brains work in predictably biased ways. This is why many of us think that in a coin toss a ‘tails’ is more likely to result after a dozen or hundred ‘heads’ in a row, even though this reasoning is 100% faulty. Enthusiastic gamblers often develop complex systems based on this kind of thinking, known as The Gambler’s Fallacy. Throw some emotion into the mix and our statistical reasoning gets even worse. This is why we fear Ebola but not heart disease and snakes but not UV rays — or why we’d rather jump in a car than a plane after terrorist attacks.

This much is perfectly understandable. But the more I looked into it, the less I felt I understood what probability actually was. It became apparent that the exact nature of ‘probability’ is disputed. If you flip a coin, for example, what actually determines the fact that it has a 50–50 chance of coming up heads? As strange as it might sound, this question, again, is a whole lot more difficult to answer than it first appears. On analysis, making a statement of probability requires making a bunch of assumptions — usually unstated — some of which rest on questionable theoretical grounds. With the coin, we’re assuming there’s only two options — we’re assuming a perfect, 2D, theoretical coin that can only land on one side or the other. But what if it were to land on its edge? Or what if it were to turn into a butterfly? In fact, some regular coins, such as pennies or dimes, are irregularly weighted, resulting in consistently uneven odds. To make any statements about probability in this case, we have to ignore the possibility of these other outcomes and factors, which may only sometimes be reasonable assumptions to make. The ‘event’ of tossing a coin, then, is something we’ve — in a sense—manufactured. So, in probability, what even is an event? And do they exist in any meaningful way outside our use of them? What makes something likely or unlikely to occur? At this point in the Googling I had to close my Macbook, place my head on the desk, and rest in quiet devastation — I’d gone too deep.

But in my gradual descent from casual curiosity to complete incomprehension, I found a few curious problems of probability. Here they are.


SAP2

The Texas Sharpshooter Fallacy

This one’s simple. A young gunslinger has just bought his first sixshooter and wants to test his aim. He heads around the back of the barn and lets loose — except he’s no good. He fires rapidly but haphazardly, leaving holes scattered across the barn wall. He stops and bemoans the result.

In one spot, however, he notices a group of holes all bunched together. With a piece of chalk he draws a target around these holes and runs off to his dad to boast of his mad sharpshooting skills. The obvious problem is that he’s actually rubbish. Some of his random shots, by chance alone, formed a cluster, which he now claims were dead on.

Apart from making him a little dishonest, this mightn’t sound like a big deal. But it turns out we actually engage in this type of reasoning all the time. The problem is especially prominent in epidemiology.

In 1998, the New Yorker published an article investigating the rising phenomenon of the ‘cancer cluster’. These are when an unusually high number of people contract cancer in a specific geographical area. From the 1970s, cases of these supposed clusters had skyrocketed — in 1997, there were around 4000 cluster alarms in Massachusetts alone. The author’s own town was in one of these areas. After a local woman’s son developed cancer, the mother — suspicious of an old nearby dump site — went door knocking looking for other cases. She found 42 people with cancer within just a few blocks of her home. Certainly a cause for concern. What is the likelihood of that many people getting cancer in such a small area? In this case, along with the tens of thousands of other ‘clusters’, scientists never found a common cause.

But except on very rare occasions, cases of cancer are almost always scattered randomly throughout populations. And like the young gunslinger firing randomly at a barn wall, this often results in extreme concentrations of cancer purely through chance alone. This doesn’t just explain away most cancer clusters — it makes them inevitable. That being said, if everyone on my block were contracting cancer and my doctor just said, “Texas Sharpshooter Fallacy!”, I’m not sure how consoled I’d be.


SAP3

The Birthday Paradox

The Birthday Paradox would be a good bar trick if it didn’t make you look like an annoying know-it-all. It goes like this: it’s your first day of university, and you’re majoring in statistics. The lecturer arrives for the first class and starts waving around a $5 note. “Who wants to bet me that two people in this room have the same birthday?” You look around. There are 60 or so students in the lecture theatre. Well, there are 365 possible birthdays in a year, so the odds are looking good. A student takes up the wager. The lecturer asks everyone to call out their birthdays, starting with the back left of the room, and for anyone else to call out and stop the game if they share that birthday.

The first person begins. No response. The second, third and fourth continue uninterrupted. On the fifth person, however, someone calls out that, yes, they share that birthday. The lecturer has won. She now holds up a tenner, and asks if anyone wants to play again. There are now six fewer people in the game so the odds are even better for the students. A different student takes up the bet and the game restarts. On the fourth person, the game is interrupted: another shared birthday. Again the lecturer asks if anyone’s willing to play for an orange twenty this time, and the students look at each other with failing confidence.

If this seems unusual to you, you’re not alone. But actually, against our better reasoning, the likelihood of at least two people sharing a birthday in a group of 60 is massively high — over 99%. At 50 people it’s still over 90%. And it’s not until the group dwindles to 23 that the likelihood of two people sharing a birthday falls to 50%.

The reason this seems strange to most of us maths plebs is that our reasoning is egocentric. When we think about sharing birthdays, we automatically just think of ourselves. My birthday is on August 7, so the likelihood that any one random person sharing that birthday is pretty low — 1/365, or about 0.27%. But the question wasn’t whether one other person shared my birthday, it was whether any two people — out of a room of 60 — shared a birthday. This means that the first person gets to compare their birthday to 59 others, the second compares to 58 others, and so on. Because it’s a group, not an individual, being tested, these probabilities accumulate, leading to almost unbeatable odds for the punting lecturer. It turns out being totally self-centred sometimes has its downfalls.


SAP4

The Monty Hall Problem

Now this is difficult and I’m only sometimes sure I actually understand how it works.

Here’s the scene: You’re on The Price is Right. It’s the final stage. Larry Emdur presents three doors to you, numbered 1, 2 and 3. Behind one door is that brand new heated La-Z-Boy recliner you’ve been keen on for ages. Behind the other two: used Crocs. You have to pick one, and you win whatever is behind that door and nothing else. No clues are given so you know for certain you’ve got an equal chance no matter which door you select.

You take a punt and select Door 1. “Okay,” Larry replies. “I’m not going to open that door quite yet. First I’m going to open Door 2.” And so he does. Behind it sits a pair of slightly yellowed Crocs.

“Now,” he continues. “Would you like to switch to Door 3 or stick to Door 1?”

You think for a moment. Your odds are certainly better now; there are only two doors to select from, so you have a 1-in-2 chance of getting it right. It makes no difference if you stick or switch. For the sake of consistency, you decide to stay on Door 1.

Were you right in your decision? Actually, no. Counterintuitively, the probability of Door 1 being correct has not increased from 1-in-3 to 1-in-2. It has remained 1-in-3. However, the probability of Door 3 being correct has now increased from 1-in-3 to 2-in-3. The likelihood of the other door being right has doubled. That is, if you change your selection, you are twice as likely to win the coveted recliner.

Now if this doesn’t sound right to you, don’t panic. The Monty Hall Problem was published in Parade magazine in the 1990s, and generated a huge amount of angry letters — some from mathematicians who were later shown to be wrong and publicly flayed. I’ll try to explain it.

When you’re first presented with three doors, there is exactly a 1-in-3 chance of you picking the right one. The vital detail here is that when Larry opens one of the two remaining doors, he knows which door conceals the recliner and will only ever open a Crocs door. If you picked the right door to begin with, there are only the two Crocs doors remaining, and so it doesn’t matter which of these remaining two Larry first opens. Alternatively, there is a 2-in-3 chance the door you initially select has Crocs behind it, with the recliner then being behind one of the remaining two. Of these two remaining doors, Larry can only open the one remaining Crocs door. In deciding to stay on Door 1, the likelihood of it being the wrong door remains at 2-in-3. The likelihood of Door 3 being correct has increased from 1-in-3 to 2-in-3, since Larry will only open the one remaining Crocs door, leaving the recliner as the only remaining option. Still don’t get it? Join the club.


SAP5

Littlewood’s Law

When something insanely unlikely happens we have a tendency to draw meaning from it. Like the soldier who gets shot in the chest but is saved by his bible or hip flask of JD in his coat pocket.

Littlewood’s Law was first described in 1986 and attempts to explain away the occurrence of super improbable, perhaps seemingly supernatural events. It does so by drawing on the ‘Law of Truly Large Numbers’ (which is legit a thing) and making a bunch of openly stated assumptions. First, it reduces all human experience to one ‘event’ per second. That is, while awake, we receive and process one piece of information per second: an itchy elbow, a PT Cruiser racing by, a chirping finch, etc. It then defines a ‘miracle’ as something that happens only once in a million of these events — making it literally a one-in-a-million event. Assuming we’re properly alert for eight hours a day, we process a full million events in around 35 days. This means we can expect to experience a miracle almost every month, making them not so miraculous after all.

If you’re already a skeptic, then you’ll likely interpret unlikely events as simple coincidences, however incredible they might seem. Because the world is massive and there are billions of people doing stuff and our brains are amazing and able to process an astonishing amount of information every waking moment, radically unlikely stuff is certain to happen.

Like one quiet evening in 1950 when fifteen choral singers from a Nebraskan church were each independently late for choir practice and subsequently absent for the accidental explosion of the entire building. Or that the first and last deaths of people working on the Hoover Dam occurred on the same day, thirteen years apart, and were father and son. Or that Mark Twain was born on the day of Halley’s Comet’s appearance in 1835 and died on its next appearance in 1910 — and in the previous year had prophetically said, “I came in with Halley’s Comet in 1835. It is coming again next year, and I expect to go out with it.” Or that guy on the internet who said he kicked a stray orange off the sidewalk on the way to school one morning, rolling it under the wheel of a parked car, and passing the same car on the way home was sprayed with fresh orange juice as the car reversed out the driveway, crushing the orange.

But if you’re more inclined to interpret these kinds of improbable events as supernatural or of having some greater meaning, then you’re likely to think Littlewood is just a massive killjoy.

Comments