Data Transformation with PHP

Encoding Categorical Variables with Rubix

Categorical data, like "color" or "size," must be converted into numerical form for machine learning models to process it. One common approach is One-Hot Encoding, which represents each category as a binary vector. This method creates separate columns for each category, assigning a 1 if the category is present and a 0 if it is not. The main goal of One-Hot Encoding is to make categorical data usable in machine learning models.

red,small
blue,medium
yellow,medium
green,large
dark,super-large
Result: Memory: 0.187 Mb Time running: 0.005 sec.
After Encoding:
--------------
red       100001000
blue      010000100
yellow    001000100
green     000100010
dark      000010001