Error, loss functions, and why they are needed

Case 1. MSE and the cost of a big miss

Imagine a service that estimates apartment prices. Nothing too fancy: we input the square footage and get a predicted price. This is a classic regression task, and MSE is almost the default choice of loss function. But this is exactly where we can clearly see the price we pay for that choice.

We can implement MSE in just a few lines of code, without any libraries, and then ruin the whole picture with a single data point. Suppose our dataset now contains a strange apartment: it could be a data error, a unique property, or simply a very bad prediction.

MSE (Mean Squared Error) is one of the most common loss functions for regression tasks. It measures the average squared difference between the model prediction and the true value:

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

Example of code:


<?php

// A simple implementation of MSE (Mean Squared Error).
// We pass in two arrays of the same length:
// $y    – true values (ground‑truth observations),
// $yHat – values predicted by the model.
// The function returns a single number: the average squared error over all observations.
function mse(array $y, array $yHat): float {
    $sum = 0.0;
    $n = count($y);

    if ($n === 0) {
        return 0.0;
    }

    for ($i = 0; $i < $n; $i++) {
        $sum += ($y[$i] - $yHat[$i]) ** 2;
    }

    return $sum / $n;
}