Categorical features and dimensionality

One-hot encoding of colors

Below is a minimal one-hot encoding example where each color maps to a 3-dimensional vector.

 
<?php

function encodeColor(string $color): array {
    return [
        
$color === 'red' 0,
        
$color === 'green' 0,
        
$color === 'blue' 0,
    ];
}

echo 
'Red:   [' implode(', 'encodeColor(color'red')) . ']' PHP_EOL;
echo 
'Green: [' implode(', 'encodeColor(color'green')) . ']' PHP_EOL;
echo 
'Blue:  [' implode(', 'encodeColor(color'blue')) . ']' PHP_EOL;

// Red:   [1, 0, 0]
// Green: [0, 1, 0]
// Blue:  [0, 0, 1]
Result: Memory: 0.001 Mb Time running: < 0.001 sec.
Red:   [1, 0, 0]
Green: [0, 1, 0]
Blue:  [0, 0, 1]

The key idea is simple: when categories increase, vector dimensionality grows because each possible value becomes a separate feature.