Gradient descent on fingers
Implementation of gradient descent
Implementation in PHP from scratch
Let’s start with a minimal example: one feature, one weight. Estimating an apartment price by its area.
Example of use
<?php
// Training data
$x = [30, 40, 50, 60]; // area in m²
$y = [3, 4, 5, 6]; // price (arbitrary units)
// Model parameters
$w = 0.0; // weight
$b = 0.0; // bias
// Training hyperparameters
$learningRate = 0.0001;
$epochs = 5000;
$n = count($x);
// Gradient descent
for ($epoch = 0; $epoch < $epochs; $epoch++) {
// Accumulated gradients
$dw = 0.0;
$db = 0.0;
// Iterate over all data points
for ($i = 0; $i < $n; $i++) {
// Model prediction
$yPred = $w * $x[$i] + $b;
// Prediction error
// If the error is positive – the model underestimates
// If the error is negative – the model overestimates
$error = $y[$i] - $yPred;
// Derivatives of MSE with respect to w and b
$dw += -2 * $x[$i] * $error;
$db += -2 * $error;
}
// Average the gradients
// We compute the average gradient over all points instead of updating after each one.
// This is classic batch gradient descent.
$dw /= $n;
$db /= $n;
// Update model parameters — gradient descent step
// We move against the direction of the gradient, because the gradient points where the error increases.
// A small step leads to more stable training.
$w -= $learningRate * $dw;
$b -= $learningRate * $db;
}
echo "w = {$w}, b = {$b}" . PHP_EOL;
Implementation in PHP – vector version
When there are more features, it is more convenient to think in vectors.
Example of use
<?php
// Dot product of two vectors
// Used to compute the prediction: ŷ = w · x
function dot(array $a, array $b): float {
$sum = 0.0;
foreach ($a as $i => $v) {
$sum += $v * $b[$i];
}
return $sum;
}
// Feature matrix X
// Each row is a single sample (data point)
// First element is the real feature (area)
// Second element is always 1 — bias term included as a feature
$X = [
[30, 1],
[40, 1],
[50, 1],
[60, 1],
];
// True targets (dependent variable)
$y = [3, 4, 5, 6];
// Weight vector of the model
// w[0] — weight for area
// w[1] — weight for bias (intercept)
$w = [0.0, 0.0];
// Training hyperparameters
$learningRate = 0.0001;
$epochs = 1000;
$n = count($X);
// Gradient descent
for ($epoch = 0; $epoch < $epochs; $epoch++) {
// Gradient vector for each weight
$dw = [0.0, 0.0];
// Loop over all samples
for ($i = 0; $i < $n; $i++) {
// Prediction: dot product of weights and features
$yPred = dot($w, $X[$i]);
// Model error on the current sample
$error = $y[$i] - $yPred;
// Update gradients for each weight
// ∂L/∂w_j = -2 * x_j * (y - ŷ)
foreach ($dw as $j => $_) {
$dw[$j] += -2 * $X[$i][$j] * $error;
}
}
// Update weights by moving against the gradient
foreach ($w as $j => $_) {
$w[$j] -= $learningRate * ($dw[$j] / $n);
}
}
// Final model weights
print_r($w);