Gradient descent on fingers

Implementation of gradient descent

Implementation in PHP from scratch

Let’s start with a minimal example: one feature, one weight. Estimating an apartment price by its area.

Example of use


<?php

// Training data
$x = [30, 40, 50, 60]; // area in m²
$y = [3, 4, 5, 6];     // price (arbitrary units)

// Model parameters
$w = 0.0; // weight
$b = 0.0; // bias

// Training hyperparameters
$learningRate = 0.0001;
$epochs = 5000;
$n = count($x);

// Gradient descent
for ($epoch = 0; $epoch < $epochs; $epoch++) {

    // Accumulated gradients
    $dw = 0.0;
    $db = 0.0;

    // Iterate over all data points
    for ($i = 0; $i < $n; $i++) {
        // Model prediction
        $yPred = $w * $x[$i] + $b;

        // Prediction error
        // If the error is positive – the model underestimates
        // If the error is negative – the model overestimates
        $error = $y[$i] - $yPred;

        // Derivatives of MSE with respect to w and b
        $dw += -2 * $x[$i] * $error;
        $db += -2 * $error;
    }

    // Average the gradients
    // We compute the average gradient over all points instead of updating after each one.
    // This is classic batch gradient descent.
    $dw /= $n;
    $db /= $n;

    // Update model parameters — gradient descent step
    // We move against the direction of the gradient, because the gradient points where the error increases.
    // A small step leads to more stable training.
    $w -= $learningRate * $dw;
    $b -= $learningRate * $db;
}

echo "w = {$w}, b = {$b}" . PHP_EOL;

Implementation in PHP – vector version

When there are more features, it is more convenient to think in vectors.

Example of use


<?php

// Dot product of two vectors
// Used to compute the prediction: ŷ = w · x
function dot(array $a, array $b): float {
    $sum = 0.0;

    foreach ($a as $i => $v) {
        $sum += $v * $b[$i];
    }

    return $sum;
}

// Feature matrix X
// Each row is a single sample (data point)
// First element is the real feature (area)
// Second element is always 1 — bias term included as a feature
$X = [
    [30, 1],
    [40, 1],
    [50, 1],
    [60, 1],
];

// True targets (dependent variable)
$y = [3, 4, 5, 6];

// Weight vector of the model
// w[0] — weight for area
// w[1] — weight for bias (intercept)
$w = [0.0, 0.0];

// Training hyperparameters
$learningRate = 0.0001;
$epochs = 1000;
$n = count($X);

// Gradient descent
for ($epoch = 0; $epoch < $epochs; $epoch++) {

    // Gradient vector for each weight
    $dw = [0.0, 0.0];

    // Loop over all samples
    for ($i = 0; $i < $n; $i++) {

        // Prediction: dot product of weights and features
        $yPred = dot($w, $X[$i]);

        // Model error on the current sample
        $error = $y[$i] - $yPred;

        // Update gradients for each weight
        // ∂L/∂w_j = -2 * x_j * (y - ŷ)
        foreach ($dw as $j => $_) {
            $dw[$j] += -2 * $X[$i][$j] * $error;
        }
    }

    // Update weights by moving against the gradient
    foreach ($w as $j => $_) {
        $w[$j] -= $learningRate * ($dw[$j] / $n);
    }
}

// Final model weights
print_r($w);