Distances and similarity

Once we represent data as numbers and vectors, a natural question arises: how do we tell whether two objects are similar or, on the contrary, very different? Machine learning almost always comes down to comparison. This text is closer to this one or that one, this user is similar to another user or not, this image belongs to class A or B, and so on.

To formalize such reasoning, we need distance measures and similarity measures. In math and ML these are not abstract terms but concrete functions that take two vectors and return a number. The algorithm then makes decisions based on that number.

Euclidean distance, dot product, cosine similarity
Case 1: Comparing objects and users
Case 2: Estimating object relevance
Case 3: Comparing texts