Gradient descent on fingers

Gradient descent (the gradient descent method) is a numerical method for finding a local minimum or maximum of a function by moving along the gradient; it is one of the core numerical methods in modern optimization.

When people in machine learning talk about “training a model”, they almost always mean the same thing: we want to choose parameters so that the error becomes as small as possible. Which exact model it is — linear regression, logistic regression, a neural network — is less important. More important is that under the hood the same mechanism almost always works: gradient descent.

  • Implementation of gradient descent
  • Example 1. Parameter trajectory
  • Example 2. Effect of learning rate
  • Example 3. Plateau and near-zero gradient
  • Example 4. Batch and stochastic descent