What is a model in the mathematical sense

Learning as minimizing error

If we can measure error (loss), then model training becomes very simple: we change the model parameters so that this number goes down. The model itself does not “understand” the task — it just minimizes the loss function we chose.

Below is a minimal example: two observations and a linear model $ŷ = w x + b$. First the parameters are bad and the loss is large. Then we adjust $w$ so that predictions become closer to reality and the loss decreases noticeably.

 
<?php

require_once __DIR__ '/code.php';

/**
 * Training dataset for a simple 1D regression example.
 *
 * Each row is: [x, yTrue]
 *
 * @var array<int, array{0: float, 1: float}> $dataset
 */
$dataset = [
    [
1.02.0],
    [
2.04.0],
];

/**
 * Format a float value for prettier output.
 *
 * - If the number is effectively an integer (e.g. 2.0), print it as "2".
 * - Otherwise print a trimmed decimal representation.
 *
 * @param float $value
 * @return string
 */
$formatNumber = function (float $value): string {
    
$asInt = (int)$value;

    return (
$value === (float)$asInt) ? (string)$asInt rtrim(rtrim(number_format($value10'.'''), '0'), '.');
};

// Плохая модель (до обучения)
$model = new LinearModel(w0.0b0.0);

foreach (
$dataset as [$x$yTrue]) {
    
$yPredicted $model->predict($x);
    
$loss squaredError($yTrue$yPredicted);

    echo 
'x = ' $formatNumber($x) . ', yTrue = ' $formatNumber($yTrue) . ', yPredicted = ' $formatNumber($yPredicted) . ', loss = ' $formatNumber($loss) . PHP_EOL;
}

echo 
PHP_EOL;

// Improved model (after several "training steps")
$model = new LinearModel(w0.8b0.0);

foreach (
$dataset as [$x$yTrue]) {
    
$yPredicted $model->predict($x);
    
$loss squaredError($yTrue$yPredicted);

    echo 
'x = ' $formatNumber($x) . ', yTrue = ' $formatNumber($yTrue) . ', yPredicted = ' $formatNumber($yPredicted) . ', loss = ' $formatNumber($loss) . PHP_EOL;
}
Result: Memory: 0.008 Mb Time running: 0.001 sec.
x = 1, yTrue = 2, yPredicted = 0, loss = 4
x = 2, yTrue = 4, yPredicted = 0, loss = 16

x = 1, yTrue = 2, yPredicted = 0.8, loss = 1.44
x = 2, yTrue = 4, yPredicted = 1.6, loss = 5.76

Training idea: repeat parameter update steps (for example, using gradient descent) until the average error on the data becomes small enough.