“Probability is not really about numbers; it is about the structure of reasoning.”
The type of probability theory taught to undergraduate students, and that dominates most of the sciences, is called frequentism. Here’s the basic idea. If you flip a fair coin twice, the result might be heads both times, tails both times, or one of each. If you flip the coin a hundred times though, you’ll usually get close to half of them landing heads (in fact you’ll get between 40 and 60 heads about 96% of the time). If you flip the coin a million times, you’ll get even closer to half the flips landing heads and the exceptions are even rarer. (Here there’s less than a one in five hundred million chance that you won’t be within 0.3% of half landing heads and half landing tails). In the limiting case, where we imagine flipping the coin an infinite number of times, we always get exactly half of the flips landing on heads.
Similarly, if we roll a fair six-sided die once, we might get any number from 1 to 6. If we roll it many many times, each number will tend to come up about one sixth of the time. If we imagine rolling the die an infinite number of times, each number would come up exactly one sixth of the time. This is one way of defining probability: how often a given outcome happens if you imagine repeating the same test infinitely many times. Because it involves how frequently an outcome will occur in the long run, this is known as the frequentist interpretation of probability.
A fundamental problem with this version of probability comes up when we ask exactly what it means to repeat the “same test”. The problem is that from the point of view of the physics involved, there’s actually nothing fundamentally random about a coin toss at all. A tossed coin obeys exact laws of physics that determine precisely how it will land; the outcome is completely determined by things like the trajectory and spin of the coin and the elasticity of the surface it lands on. If you truly repeat exactly the same test, including tossing the coin exactly the same every time, it will follow the same path and land with the same side up. This is because randomness does not come from the coin toss, but rather randomness comes from not knowing all the relevant details of the coin toss.
To think that randomness comes from the coin toss itself is to commit what E. T. Jaynes called the mind projection fallacy. The coin toss is random to you only because you don’t have information about all the variables that affect its outcome. The real source of randomness is your ignorance of the relevant details. Beginning with this realization leads us to a different version of probability, the Bayesian interpretation.
Bayesian probability is a subtler concept than frequentism, but it corresponds better with most people’s intuitions about probability. In the Bayesian interpretation probability represents a degree of belief, instead of the frequency of outcomes. Bayesian probability is relative to a state of information. If you know more of the relevant information, your degree of belief will change accordingly.
“Uncertainty is a personal matter; it is not the uncertainty but your uncertainty.”
A theory about the world is either true or false, and this is problematic for frequentists because it means they can never assign probabilities to theories. Bayesian probability, on the other hand, does allow you to talk about the probability that a theory is true or false. For example, imagine that I roll a fair six-sided die, but I hide the result from everyone’s view. Now I ask a frequentist and a Bayesian the same question: What is the probability that I rolled a four?
Frequentist reply: Probability does not enter into it. The die has already been rolled, and so there is no random process occurring. The die is either showing four with certainty, or showing another number with certainty. Which one it is, I do not know.
Bayesian reply: The die will have landed on four one sixth of the time, and I have no other information about the state of the die, so relative to my state of knowledge, the probability is 1/6.
Ultimately, the whole point of probability and statistics is hypothesis testing: using data to judge an idea’s truth or falsehood. In 1946 Richard Cox stated three properties that we should want a system of reasoning using probability to have.
- The probability of an idea should depend on the relevant information we know.
- It should give the same answers as deductive logic when statements are certainly true or certainly false.
- If there are different ways of calculating a probability, these methods should all agree.
From these criteria he derived a theorem, now known as Cox’s Theorem, which shows that the rules of Bayesian probability can also be interpreted as rules of logic, and that these are the only rules of probabilistic reasoning that will satisfy the three criteria above. This is the deep insight of Bayesian probability: it is the single unique way of turning data into beliefs that is consistent with the laws of probability that govern our universe. In other words, Bayesian reasoning isn’t just a normative law, a way of doing inference that simply seems to work well. Bayesian reasoning is a descriptive law about the fundamental way that inference really works.
Using methods of inference other than Bayesian ones are like using Newton’s laws of motion instead of Einstein’s. Newton’s laws don’t correctly describe how the laws of the universe actually work, but in many situations they give the same answers as Einstein’s equations, which are the correct equations. In some situations Newton’s laws give different answers than Einstein, and if you use them there you’ll get wrong answers. The domination of frequentism in the sciences may be causing exactly this kind of problem. The inappropriate use of statistics may have lead to an alarmingly high error rates in the published literature in several fields.
“Nobody contends that all of science is wrong, or that it hasn’t compiled an impressive array of truths about the natural world. Still, any single scientific study alone is quite likely to be incorrect, thanks largely to the fact that the standard statistical system for drawing conclusions is, in essence, illogical.”
In 1975 statistician Dennis Lindley predicted that in the 21st century Bayesian ideas would eventually come to dominate frequentist ones. Today Bayesian methods aren’t just used to find lost submarines, make predictions about climate change, keep spam out of your inbox, and forecast election results, but more importantly they’re revolutionizing the sciences. We’re not quite there yet, but it looks like Lindley’s prediction has a reasonable probability of coming true.
Note: For those interested in Bayesian reasoning, there is no better book on the subject than Probability Theory: The Logic of Science by E. T. Jaynes. The first three chapters can be downloaded for free.