Bayesian theory provides a principled method for calculating conditional probabilities. With it, we can easily compute conditional probabilities for events where intuition often fails.

Bayesian theory is not only a powerful tool in the field of probability, but also widely used in machine learning. It is used to fit probabilistic models to training datasets (called Maximum A Posteriori or MAP), and in developing models for classification problems such as the Bayesian optimal classifier and Naive Bayes. In this article, you’ll discover Bayes’ Theorem for computing conditional probability and how it is used in machine learning. Before reading this article, it’s recommended to first learn about confusion matrices in machine learning.

This article will cover:

  • Bayes’ Theorem for conditional probability
  • Explanation of common terminology
  • Example calculations using Bayes’ Theorem
    • Diagnostic testing scenario
    • Manual calculation
    • Python computation

1. Bayes’ Theorem for Conditional Probability

Before diving into Bayes’ Theorem, let’s review marginal, joint, and conditional probabilities.

1.1 General Conditional Probability

Marginal probability is the probability of an event, irrespective of other variables. If the variables are independent, this is simply the event’s probability; otherwise, it’s the sum of probabilities across all possible outcomes of the other variables (called the sum rule).

  • Marginal probability: the probability of an event, such as P(A), regardless of other variables.

Joint probability is the probability of two (or more) events occurring together, such as the probability of events A and B (e.g., variables X and Y). It’s usually written as A and B.

  • Joint probability: the probability of two (or more) events occurring simultaneously, such as P(A and B) or P(A, B).

Conditional probability is the probability of one event occurring given that another has occurred. It’s written as P(A | B), the probability of A given B.

  • Conditional probability: the probability of one (or more) events given another, such as P(A | B).

You can compute joint probability using conditional probability, for example:

  • P(A, B) = P(A | B) * P(B)

This is called the product rule. Importantly, joint probability is symmetric:

  • P(A, B) = P(B, A)

Conditional probability can also be computed using joint probability:

  • P(A | B) = P(A, B) / P(B)

However, conditional probability is not symmetric:

  • P(A | B) != P(B | A)

1.2 Bayes’ Theorem for Conditional Probability

Bayes’ Theorem allows us to compute one conditional probability from another:

  • P(A | B) = P(B | A) * P(A) / P(B)
  • Similarly, P(B | A) = P(A | B) * P(B) / P(A)

This alternative method is useful when joint probability is hard to compute, or when the reverse conditional probability is available.

This form of conditional probability calculation is called Bayes’ Rule or Bayes’ Theorem, named after Reverend Thomas Bayes.

Bayes’ Theorem: A principle for calculating conditional probability without joint probability.

We often can’t directly compute the denominator (P(B)), but we can use:

  • P(B) = P(B | A) * P(A) + P(B | not A) * P(not A)

So the full formula becomes:

  • P(A | B) = P(B | A) * P(A) / [P(B | A) * P(A) + P(B | not A) * P(not A)]

Note: the denominator is just the expansion using the law of total probability.

For example:

  • P(not A) = 1 – P(A)
  • P(B | not A) = 1 – P(not B | not A)

Now that we’re familiar with the computation, let’s break down the terms in the equation.

2. Explanation of Common Terms

The terms in Bayes’ Theorem vary depending on context.

Typically, the result P(A | B) is called the posterior, and P(A) is called the prior.

  • P(A | B): posterior probability
  • P(A): prior probability

P(B | A) is called the likelihood, and P(B) is the evidence.

  • P(B | A): likelihood
  • P(B): evidence

Thus, Bayes’ Theorem can be rewritten as:

Posterior = Likelihood * Prior / Evidence

Let’s clarify this with a smoking and cancer example:

Suppose a person smokes. What is the probability they have cancer?

  • P(Cancer | Smoking) = P(Smoking | Cancer) * P(Cancer) / P(Smoking)

With this understanding, let’s look at a practical scenario.

3. Bayes’ Theorem Example Calculation

To understand Bayes’ Theorem, we use a medical example involving a diagnostic test.

3.1 Diagnostic Testing Scenario

Medical tests are not perfect. For example, consider a COVID-19 test kit with 99% sensitivity, 98% specificity, and a 0.0001 (0.01%) infection rate in the general population. If someone tests positive, what’s the probability they actually have COVID-19?

3.2 Manual Probability Calculation

Let’s start with sensitivity: the proportion of true positives.

  • P(Test = Positive | COVID = True) = 0.99

Intuitively, we may think a positive test means a 99% chance of infection, but that’s wrong.

This is called the base rate fallacy—ignoring the low infection rate in the population.

Base rate:

  • P(COVID = True) = 0.0001 (0.01%)

We calculate:

  • P(COVID = True | Test = Positive) = (0.99 * 0.0001) / P(Test = Positive)

Now compute P(Test = Positive):

  • P(Test = Positive) = P(Test = Positive | COVID = True) * P(COVID = True) + P(Test = Positive | COVID = False) * P(COVID = False)

First, compute:

  • P(COVID = False) = 1 – 0.0001 = 0.9999

Assume specificity = 98%, so:

  • P(Test = Negative | COVID = False) = 0.98
  • P(Test = Positive | COVID = False) = 1 – 0.98 = 0.02

Then:

  • P(Test = Positive) = 0.99 * 0.0001 + 0.02 * 0.9999 = 0.000099 + 0.019998 = 0.020097

Finally:

  • P(COVID = True | Test = Positive) = 0.000099 / 0.020097 ≈ 0.00493

So, despite testing positive, there’s only a 0.493% chance the person actually has COVID-19!

This surprising result highlights how real probabilities can defy our intuition.

To use Bayes’ Theorem, we need:

  • Base rate (prior)
  • Sensitivity (true positive rate)
  • Specificity (true negative rate)

Even without P(Test = Positive), we can compute it from the above.

With more context (e.g., age, location), Bayes’ Theorem can give even more accurate estimates.

Python Computation

Here’s a Python implementation of the scenario:

# Given P(A), P(B|A), P(B|not A), compute P(A|B)
def bayes_theorem(p_a, p_b_given_a, p_b_given_not_a):
    # calculate P(not A)
    not_a = 1 - p_a
    # calculate P(B)
    p_b = p_b_given_a * p_a + p_b_given_not_a * not_a
    # calculate P(A|B)
    p_a_given_b = (p_b_given_a * p_a) / p_b
    return p_a_given_b

# P(A)
p_a = 0.0001
# P(B|A)
p_b_given_a = 0.99
# P(B|not A)
p_b_given_not_a = 0.02
# Compute P(A|B)
result = bayes_theorem(p_a, p_b_given_a, p_b_given_not_a)
# Print result
print('P(A|B) = %.3f%%' % (result * 100))
# P(A|B) = 0.493%

Summary

This article started with conditional probability, introduced Bayes’ Theorem, and showed how it corrects our misleading intuition. In the example, even with 99% sensitivity and 98% specificity, and a base rate of 0.01%, a positive test still only yields a 0.493% chance of actual infection.

This demonstrates the importance of using a scientific approach over intuition when interpreting probabilities.