Error, loss functions, and why they are needed

Case 2. Choosing a model via the loss function

This case is a logical continuation of the previous one. There we looked at how a single bad data point can distort the picture; here we answer a different practical question: how to formally choose the better model when there are several options and they all look almost the same by eye.

Case goal:
To show that a loss function turns the subjective "this model seems better" into a measurable criterion that we can actually use to make a decision.

Scenario:
Imagine a product demand forecasting task. We have historical data and two model variants:

  • Model A – a simple linear model, interpretable and stable
  • Model B – a slightly more complex model with additional parameters

Both models are already trained. We do not discuss their internal structure here – in this case only one thing matters: which of them makes fewer mistakes.

 
<?php

require_once __DIR__ '/code.php';

// Observed (actual) target values from your dataset
$y = [1012151413];

// Predicted values produced by model A for the same inputs
$modelA = [911141312];

// Predicted values produced by model B for the same inputs
$modelB = [1013151514];

// Compute and print the Mean Squared Error (MSE) for each model.
// The model with the lower MSE is considered to fit the data better.
echo 'MSE A: ' mse($y$modelA) . PHP_EOL;
echo 
'MSE B: ' mse($y$modelB) . PHP_EOL;
Result: Memory: 0.002 Mb Time running: < 0.001 sec.
MSE A: 1
MSE B: 0.6

After computing the MSE we get two concrete numbers. One of them is smaller – and this is the only formal argument that really matters for training and model selection. Even if the difference between the MSE values is small, it still reflects a systematic advantage of one model over the other within the chosen philosophy of error.
Even if the curves look almost identical visually, the loss gives a numerical basis for making a choice.