Gradient descent on fingers

Gradient descent (the gradient descent method) is a numerical method for finding a local minimum or maximum of a function by moving along the gradient; it is one of the core numerical methods in modern optimization.

When people in machine learning talk about "training a model", they almost always mean the same thing: we want to choose parameters so that the error becomes as small as possible. Which exact model it is — linear regression, logistic regression, a neural network — is less important. More important is that under the hood the same mechanism almost always works: gradient descent.

Implementation of gradient descent
Example 1. Parameter trajectory
Example 2. Effect of learning rate
Example 3. Plateau and near-zero gradient
Example 4. Batch and stochastic descent