Data Transformation with PHP
Encoding Categorical Variables with Rubix
Categorical data, like "color" or "size," must be converted into numerical form for machine learning models to process it. One common approach is One-Hot Encoding, which represents each category as a binary vector. This method creates separate columns for each category, assigning a 1 if the category is present and a 0 if it is not. The main goal of One-Hot Encoding is to make categorical data usable in machine learning models.
Dataset
red,small
blue,medium
yellow,medium
green,large
dark,super-large
Result:
Memory: 0.187 Mb
Time running: 0.005 sec.
After Encoding:
--------------
red 100001000
blue 010000100
yellow 001000100
green 000100010
dark 000010001