Gradient descent on fingers
Example 3. Plateau and near-zero gradient
Below is the result of running gradient descent for the function $L(w)=(w-3)^4$. Notice how quickly the gradient shrinks and how the parameter $w$ slows down as it approaches the minimum.
Example of use
<?php
$w = 0.0;
$lr = 0.05;
echo "epoch\tw\t\tgradient\t\tloss\n";
for ($epoch = 1; $epoch <= 25; $epoch++) {
// Loss function with a very flat region near the minimum.
$loss = ($w - 3) ** 4;
// Derivative: d/dw (w - 3)^4 = 4 (w - 3)^3.
$gradient = 4 * ($w - 3) ** 3;
echo str_pad($epoch, 8) .
str_pad(round($w, 5), 16) .
str_pad(round($gradient, 5), 24) .
str_pad(round($loss, 5), 15) . "\n";
// Gradient descent update.
$w -= $lr * $gradient;
}
Result:
Memory: 0.002 Mb
Time running: < 0.001 sec.
epoch w gradient loss
1 0 -108 81
2 5.4 55.296 33.1776
3 2.6352 -0.19419 0.01771
4 2.64491 -0.17909 0.0159
5 2.65386 -0.16588 0.01435
6 2.66216 -0.15424 0.01303
7 2.66987 -0.14392 0.01188
8 2.67707 -0.13471 0.01088
9 2.6838 -0.12646 0.01
10 2.69012 -0.11902 0.00922
11 2.69608 -0.11229 0.00853
12 2.70169 -0.10618 0.00792
13 2.707 -0.10062 0.00737
14 2.71203 -0.09552 0.00688
15 2.71681 -0.09085 0.00643
16 2.72135 -0.08655 0.00603
17 2.72568 -0.08258 0.00566
18 2.7298 -0.0789 0.00533
19 2.73375 -0.0755 0.00503
20 2.73752 -0.07233 0.00475
21 2.74114 -0.06938 0.00449
22 2.74461 -0.06663 0.00425
23 2.74794 -0.06406 0.00404
24 2.75114 -0.06165 0.00384
25 2.75423 -0.05938 0.00365