Error, loss functions, and why they are needed

Case 5. Training a model as minimizing error

Implementation in pure PHP

Up to this point we used loss as a way to evaluate a model. Now we want to see the key idea: training a model is the process of minimizing a loss function. In this case we will literally "watch" training happen without gradient descent and without library "magic": we take a simple dependency $y = 2x$, define a model $ŷ = w·x$, and search for the $w$ that minimizes MSE.

Example of code


<?php

// Step size for a simple grid search over the weight parameter w.
define('W_STEP', 0.01);

// Model: ŷ = w · x
// For each input x_i we predict y_hat_i.
function predict(array $x, float $w): array {
    return array_map(fn ($xi): float => $w * $xi, $x);
}

// Mean Squared Error (MSE):
// MSE = (1/n) * Σ (y_i - ŷ_i)^2
// This is the loss function we will minimize.
function mse(array $y, array $yHat): float {
    $sum = 0.0;
    $n = count($y);

    if ($n === 0) {
        return 0.0;
    }

    for ($i = 0; $i < $n; $i++) {
        $sum += ($y[$i] - $yHat[$i]) ** 2;
    }

    return $sum / $n;
}

// "Training" in this case is a brute-force search:
// try many values of w and pick the one that gives the smallest MSE.
function findBestW(array $x, array $y, float $from = 0.0, float $to = 3.0, float $step = W_STEP): array {
    $bestW = null;
    $bestLoss = INF;

    for ($w = $from; $w <= $to; $w += $step) {
        $yHat = predict($x, $w);
        $loss = mse($y, $yHat);

        // Keep the best parameter value we have seen so far.
        if ($loss < $bestLoss) {
            $bestLoss = $loss;
            $bestW = $w;
        }
    }

    return [
        'bestW' => $bestW,
        'bestLoss' => $bestLoss,
    ];
}

Implementation in RubixML

Next we do the same with Rubix ML: the library finds the best parameter (weights) that minimizes the error (for linear regression it does so analytically via the least squares method).

Example of code


<?php

use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Regressors\Ridge;

// Training data for a simple relationship y = 2x.
$samples = [[1], [2], [3], [4]];
$labels = [2, 4, 6, 8];

// A labeled dataset pairs each sample with its target value.
$dataset = new Labeled($samples, $labels);

// Ridge is linear regression with L2 regularization.
// With a tiny alpha (1e-6) it behaves almost like ordinary least squares.
$model = new Ridge(1e-6);

// Train = fit the best parameters (weights) that minimize error.
$model->train($dataset);