Probability as degree of confidence

Case 2. Medical test: updating confidence

When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.

Scenario

Suppose there is a rare disease. We know the following:

  • The disease occurs in 1 person out of 1,000.
  • Test sensitivity (probability of a positive result given the disease) is 99%.
  • Test specificity (probability of a negative result given a healthy person) is 95%.

A patient takes the test and gets a positive result. Question: what is the probability that they truly have the disease?

An intuitive answer often sounds like “about 99%”. But that is wrong. The reason is the rarity of the event.

Example of use:

 
<?php

$prior 
0.001;        // P(disease)
$sensitivity 0.99;   // P(positive | disease)
$specificity 0.95;   // P(negative | healthy)

$falsePositive $specificity;  // P(negative | ealthy)

$posterior = ($sensitivity $prior)
    / ((
$sensitivity $prior) + ($falsePositive * ($prior)));

echo 
"True positive cases: " . ($sensitivity $prior) . PHP_EOL;
echo 
"False positive results: " . ($falsePositive * ($prior)) . PHP_EOL;
echo 
"Probability: " round($posterior4);