Gradient descent on fingers

Example 4. Batch and stochastic descent

In this example we compare two training approaches: batch gradient descent (update on the whole dataset) and stochastic gradient descent (SGD, update on a single sample).

Goal: understand why batch training looks smooth and stable, while SGD looks noisy and jittery — and why this is not a bug, but a deliberate algorithmic choice.

Example of code:

 
<?php

// Batch Gradient Descent
$x = [1234];
$y = [2468];

$w 0.0;
$lr 0.1;
$n count($x);

echo 
"Batch GD\n";

define('EPOCHS'10);

for (
$epoch 1$epoch <= EPOCHS$epoch++) {

    
$gradient 0.0;

    for (
$i 0$i $n$i++) {
        
$gradient += $x[$i] * (($w $x[$i]) - $y[$i]);
    }

    
$gradient = ($n) * $gradient;
    
$w -= $lr $gradient;

    echo 
"Epoch $epoch: w = " round($w4) . "\n";
}

echo 
"\n";

// Stochastic Gradient Descent
$x = [1234];
$y = [2468];

$w 0.0;
$lr 0.1;
$n count($x);

echo 
"Stochastic GD\n";

for (
$epoch 1$epoch <= EPOCHS$epoch++) {

    for (
$i 0$i $n$i++) {
        
$gradient $x[$i] * (($w $x[$i]) - $y[$i]);
        
$w -= $lr $gradient;
    }

    echo 
"Epoch $epoch: w = " round($w4) . "\n";
}