Error, loss functions, and why they are needed

Case 3. Log loss and classifier confidence

In classification tasks, we often care not only about what class the model predicts, but also about how confident it is in that prediction. Log loss (cross-entropy loss) is a natural way to penalize overconfident mistakes and reward well-calibrated probabilities.

Two models may give the same final class labels after thresholding at 0.5, but one of them can still be much better calibrated. Log loss makes this difference visible in a single number.

For binary classification, log loss is defined as:

$\text{LogLoss} = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(p_i) + (1 - y_i) \log(1 - p_i)]$, where $y_i$ is the true label (0 or 1) and $p_i$ is the predicted probability of class 1.

Example of code:


<?php

// Binary log loss implementation.
// $yTrue  – array of true labels (0 or 1).
// $yProb  – array of predicted probabilities P(y = 1).
// Returns average negative log-likelihood of the true labels.
function logLoss(array $y, array $p, float $eps = 1e-15): float {
    $loss = 0.0;
    $n = count($y);

    if ($n === 0) {
        return 0.0;
    }

    for ($i = 0; $i < $n; $i++) {
        $pi = max($eps, min(1 - $eps, $p[$i]));
        $loss += $y[$i] * log($pi) + (1 - $y[$i]) * log(1 - $pi);
    }

    return -$loss / $n;
}

/**
 * Get log loss term for a single sample.
 * @param $yTrue
 * @param float $eps
 * @param $yProb
 * @return float|int
 */
function logLossTerm($yTrue, float $eps, $yProb): int|float {
    $y = (int)$yTrue;
    $p = max($eps, min(1.0 - $eps, (float)$yProb));

    // For binary classification: -[y * log(p) + (1 - y) * log(1 - p)]
    return -($y * log($p) + (1 - $y) * log(1 - $p));
}