Gradient descent on fingers

Example 2. Effect of learning rate

In this example we run the same gradient descent with different learning rate values and compare how the parameter $w$ changes.

The goal is to see the difference between a step that is too small, a normal step, and a step that is too large and causes divergence.

Data:
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]

Model:
$$\hat{y} = w \cdot x$$

We compare the following learning rates:
lr = 0.01
lr = 0.1
lr = 1.0

Example of code:

 
<?php

function train(float $lr, array $x, array $y): array {
    
// Start with an initial weight and track how it changes each epoch.
    
$w 0.0;
    
$n count($x);
    
$trajectory = [];

    echo 
PHP_EOL "Learning rate = $lrPHP_EOL;

    for (
$epoch 1$epoch <= 10$epoch++) {
        
$gradient 0.0;

        for (
$i 0$i $n$i++) {
            
// Compute prediction error for one training example.
            
$error = ($w $x[$i]) - $y[$i];
            
$gradient += $x[$i] * $error;
        }

        
// Convert the accumulated value into the mean squared error gradient.
        
$gradient = ($n) * $gradient;
        
// Update the weight in the direction that reduces the loss.
        
$w -= $lr $gradient;
        
// Store the rounded weight so the caller can visualize the path.
        
$trajectory[] = [
            
'epoch' => $epoch,
            
'w' => round($w4),
        ];

        echo 
"Epoch $epoch: w = " round($w4) . PHP_EOL;
    }

    return 
$trajectory;
}