Case 1: Comparing objects and users
Comparing users and objects: normalization, Euclidean distance, and dot product
Below is a runnable three-step example: feature normalization, Euclidean distance between users, and item relevance scoring with dot product.
Example of use
<?php
require_once __DIR__ . '/code.php';
echo 'Step 1: Feature normalization' . PHP_EOL;
echo '----------' . PHP_EOL;
$userA = [25, 3000, 5];
$userB = [40, 5000, 8];
$min = [18, 1000, 1];
$max = [65, 10000, 20];
$userANorm = normalize($userA, $min, $max);
$userBNorm = normalize($userB, $min, $max);
echo 'userA: ' . array_to_vector($userA) . PHP_EOL;
echo 'userB: ' . array_to_vector($userB) . PHP_EOL;
echo 'normalized userA: ' . array_to_vector($userANorm) . PHP_EOL;
echo 'normalized userB: ' . array_to_vector($userBNorm) . PHP_EOL;
echo 'Expected approx userA: [0.1489, 0.2222, 0.2105]' . PHP_EOL;
echo 'Expected approx userB: [0.4681, 0.4444, 0.3684]' . PHP_EOL . PHP_EOL;
echo 'Step 2: Euclidean distance between users' . PHP_EOL;
echo '----------' . PHP_EOL;
$distance = euclideanDistance($userANorm, $userBNorm);
echo 'distance: ' . $distance . PHP_EOL;
echo 'Expected approx: 0.4197' . PHP_EOL;
echo 'Explanation: d = sqrt((0.1489 - 0.4681)^2 + (0.2222 - 0.4444)^2 + (0.2105 - 0.3684)^2)' . PHP_EOL . PHP_EOL;
echo 'Step 3: Dot product user-item score' . PHP_EOL;
echo '----------' . PHP_EOL;
$item = [0.2, 0.9, 0.7];
$score = dotProduct($userANorm, $item);
echo 'item: ' . array_to_vector($item) . PHP_EOL;
echo 'score: ' . $score . PHP_EOL;
echo 'Expected approx: 0.3771' . PHP_EOL;
echo 'Explanation: score = (0.1489 * 0.2) + (0.2222 * 0.9) + (0.2105 * 0.7)' . PHP_EOL;
Result:
Memory: 0.008 Mb
Time running: 0.001 sec.
Step 1: Feature normalization
----------
userA: [25, 3000, 5]
userB: [40, 5000, 8]
normalized userA: [0.14893617021277, 0.22222222222222, 0.21052631578947]
normalized userB: [0.46808510638298, 0.44444444444444, 0.36842105263158]
Expected approx userA: [0.1489, 0.2222, 0.2105]
Expected approx userB: [0.4681, 0.4444, 0.3684]
Step 2: Euclidean distance between users
----------
distance: 0.41972551439053
Expected approx: 0.4197
Explanation: d = sqrt((0.1489 - 0.4681)^2 + (0.2222 - 0.4444)^2 + (0.2105 - 0.3684)^2)
Step 3: Dot product user-item score
----------
item: [0.2, 0.9, 0.7]
score: 0.37715565509518
Expected approx: 0.3771
Explanation: score = (0.1489 * 0.2) + (0.2222 * 0.9) + (0.2105 * 0.7)
Takeaway: normalization makes feature scales comparable, Euclidean distance shows user proximity, and dot product gives a simple relevance score between a user and an item.