Categorical features and dimensionality
One-hot encoding of colors
Below is a minimal one-hot encoding example where each color maps to a 3-dimensional vector.
Example of use
<?php
function encodeColor(string $color): array {
return [
$color === 'red' ? 1 : 0,
$color === 'green' ? 1 : 0,
$color === 'blue' ? 1 : 0,
];
}
echo 'Red: [' . implode(', ', encodeColor(color: 'red')) . ']' . PHP_EOL;
echo 'Green: [' . implode(', ', encodeColor(color: 'green')) . ']' . PHP_EOL;
echo 'Blue: [' . implode(', ', encodeColor(color: 'blue')) . ']' . PHP_EOL;
// Red: [1, 0, 0]
// Green: [0, 1, 0]
// Blue: [0, 0, 1]
Result:
Memory: 0.001 Mb
Time running: < 0.001 sec.
Red: [1, 0, 0]
Green: [0, 1, 0]
Blue: [0, 0, 1]
The key idea is simple: when categories increase, vector dimensionality grows because each possible value becomes a separate feature.