Probability as degree of confidence

Case 4. Overconfident model as a problem signal

When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.

Case Goal

This case demonstrates an important engineering intuition: extremely high model confidence is not always a good thing. In practice, developers often celebrate when a model outputs probabilities like 0.99 or even 1.0. It feels like the model is “confident”, therefore it must be working well. But in reality, this is often a signal of problems: overfitting, data leakage, overly simplistic or “dirty” data, or poorly calibrated probabilities. Key idea: a healthy model almost always doubts and preserves uncertainty on hard and borderline examples.

 
<?php

// A few model outputs (probability distributions) for multiple inputs.
$predictions = [
    [
'spam' => 0.99'normal' => 0.01],
    [
'spam' => 0.98'normal' => 0.02],
    [
'spam' => 1.00'normal' => 0.00],
    [
'spam' => 0.85'normal' => 0.15],
    [
'spam' => 0.98'normal' => 0.02],
];

// Treat extremely high probability as a potential red flag.
$threshold 0.97;
$flaggedRows 0;

// Scan every row and print any class whose confidence is above the threshold.
foreach ($predictions as $i => $probs) {
    foreach (
$probs as $class => $p) {
        if (
$p $threshold) {
            echo 
"[$i] high confidence: $class = $p\n";
            
$flaggedRows++;
        }
    }
}

$totalRows count($predictions);
$isSystemic $totalRows && ($flaggedRows $totalRows) >= 0.8;
Result: Memory: 0.003 Mb Time running: < 0.001 sec.
[0] high confidence: spam = 0.99
[1] high confidence: spam = 0.98
[2] high confidence: spam = 1
[4] high confidence: spam = 0.98
Overconfidence threshold:
0.97
Interpretation:

Rows with high confidence: 4 out of 5

Almost all rows fall under this condition, it is already a systemic problem.