Data Transformation with PHP

Encoding Categorical Variables with Rubix

Categorical data, like "color" or "size," must be converted into numerical form for machine learning models to process it. One common approach is One-Hot Encoding, which represents each category as a binary vector. This method creates separate columns for each category, assigning a 1 if the category is present and a 0 if it is not. The main goal of One-Hot Encoding is to make categorical data usable in machine learning models.

red,small
blue,medium
yellow,medium
green,large
dark,super-large

Example of use:

 
<?php

use Rubix\ML\Datasets\Unlabeled;
use 
Rubix\ML\Extractors\CSV;
use 
Rubix\ML\Transformers\OneHotEncoder;

// Load the dataset using CSV
$dataset Unlabeled::fromIterator(new CSV(dirname(__FILE__) . '/data/colors_and_size.csv'false));

$encoder = new OneHotEncoder();
$encoder->fit($dataset);
$samples $dataset->samples();
$transformedSamples $samples;
$encoder->transform($transformedSamples);

echo 
"\nAfter Encoding:\n";
echo 
"--------------\n";
foreach (
$transformedSamples as $ind => $sample) {
    echo 
str_pad($samples[$ind][0], 10) . implode(''$sample) . "\n";
}