Gradient descent on fingers
Example 4. Batch and stochastic descent
In this example we compare two training approaches: batch gradient descent (update on the whole dataset) and stochastic gradient descent (SGD, update on a single sample).
Goal: understand why batch training looks smooth and stable, while SGD looks noisy and jittery — and why this is not a bug, but a deliberate algorithmic choice.
Example of code:
<?php
// Batch Gradient Descent
$x = [1, 2, 3, 4];
$y = [2, 4, 6, 8];
$w = 0.0;
$lr = 0.1;
$n = count($x);
echo "Batch GD\n";
define('EPOCHS', 10);
for ($epoch = 1; $epoch <= EPOCHS; $epoch++) {
$gradient = 0.0;
for ($i = 0; $i < $n; $i++) {
$gradient += $x[$i] * (($w * $x[$i]) - $y[$i]);
}
$gradient = (2 / $n) * $gradient;
$w -= $lr * $gradient;
echo "Epoch $epoch: w = " . round($w, 4) . "\n";
}
echo "\n";
// Stochastic Gradient Descent
$x = [1, 2, 3, 4];
$y = [2, 4, 6, 8];
$w = 0.0;
$lr = 0.1;
$n = count($x);
echo "Stochastic GD\n";
for ($epoch = 1; $epoch <= EPOCHS; $epoch++) {
for ($i = 0; $i < $n; $i++) {
$gradient = 2 * $x[$i] * (($w * $x[$i]) - $y[$i]);
$w -= $lr * $gradient;
}
echo "Epoch $epoch: w = " . round($w, 4) . "\n";
}