Gradient descent on fingers
Example 2. Effect of learning rate
In this example we run the same gradient descent with different learning rate values and compare how the parameter $w$ changes.
The goal is to see the difference between a step that is too small, a normal step, and a step that is too large and causes divergence.
Data:
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
Model:
$$\hat{y} = w \cdot x$$
We compare the following learning rates:
lr = 0.01
lr = 0.1
lr = 1.0
Example of code:
<?php
function train(float $lr, array $x, array $y): array {
// Start with an initial weight and track how it changes each epoch.
$w = 0.0;
$n = count($x);
$trajectory = [];
echo PHP_EOL . "Learning rate = $lr" . PHP_EOL;
for ($epoch = 1; $epoch <= 10; $epoch++) {
$gradient = 0.0;
for ($i = 0; $i < $n; $i++) {
// Compute prediction error for one training example.
$error = ($w * $x[$i]) - $y[$i];
$gradient += $x[$i] * $error;
}
// Convert the accumulated value into the mean squared error gradient.
$gradient = (2 / $n) * $gradient;
// Update the weight in the direction that reduces the loss.
$w -= $lr * $gradient;
// Store the rounded weight so the caller can visualize the path.
$trajectory[] = [
'epoch' => $epoch,
'w' => round($w, 4),
];
echo "Epoch $epoch: w = " . round($w, 4) . PHP_EOL;
}
return $trajectory;
}