Probability as degree of confidence

Case 4. Overconfident model as a problem signal

When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.

Case Goal

This case demonstrates an important engineering intuition: extremely high model confidence is not always a good thing. In practice, developers often celebrate when a model outputs probabilities like 0.99 or even 1.0. It feels like the model is “confident”, therefore it must be working well. But in reality, this is often a signal of problems: overfitting, data leakage, overly simplistic or “dirty” data, or poorly calibrated probabilities. Key idea: a healthy model almost always doubts and preserves uncertainty on hard and borderline examples.

Example of code:


<?php

// A few model outputs (probability distributions) for multiple inputs.
$predictions = [
    ['spam' => 0.99, 'normal' => 0.01],
    ['spam' => 0.98, 'normal' => 0.02],
    ['spam' => 1.00, 'normal' => 0.00],
    ['spam' => 0.85, 'normal' => 0.15],
    ['spam' => 0.98, 'normal' => 0.02],
];