Probability as degree of confidence

Case 2. Medical test: updating confidence

When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.

Scenario

Suppose there is a rare disease. We know the following:

The disease occurs in 1 person out of 1,000.
Test sensitivity (probability of a positive result given the disease) is 99%.
Test specificity (probability of a negative result given a healthy person) is 95%.

A patient takes the test and gets a positive result. Question: what is the probability that they truly have the disease?

An intuitive answer often sounds like "about 99%". But that is wrong. The reason is the rarity of the event.

Example of code:


<?php

$prior = 0.001;        // P(disease)
$sensitivity = 0.99;   // P(positive | disease)
$specificity = 0.95;   // P(negative | healthy)

$falsePositive = 1 - $specificity;  // P(negative | ealthy)

$posterior = ($sensitivity * $prior)
    / (($sensitivity * $prior) + ($falsePositive * (1 - $prior)));

echo 'True positive cases: ' . ($sensitivity * $prior) . PHP_EOL;
echo 'False positive results: ' . ($falsePositive * (1 - $prior)) . PHP_EOL;
echo 'Probability: ' . round($posterior, 4);