Data Transformation with PHP

Normalizing and Scaling Numerical Features with Rubix

Normalization adjusts numerical data to a standard range (often $[0, 1]$), which helps with model performance when features are on different scales.

2000,300,low
2400,450,medium
3000,500,high

Example of use:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Extractors\CSV;
use 
Rubix\ML\Transformers\MinMaxNormalizer;
use 
Rubix\ML\Transformers\NumericStringConverter;

// Load the CSV data
$extractor = new CSV(dirname(__FILE__) . '/data/numerical.csv');

// Convert strings to numbers and separate features from labels
$samples = [];
$labels = [];

foreach (
$extractor as $row) {
    
$samples[] = [
        (int)
$row[0],
        (int)
$row[1],
    ];
    
$labels[] = $row[2];
}

// Create the dataset
$dataset = new Labeled($samples$labels);

$normalizer = new MinMaxNormalizer();
$normalizer->fit($dataset);

$samples $dataset->samples();
$labels $dataset->labels();
$normalizer->transform($samples);

echo 
"\nNormalized data:\n";
echo 
"---------------\n";
// Print normalized data with labels in CSV format
foreach ($samples as $ind => $sample) {
    echo 
implode(','$sample) . ',' $labels[$ind] . "\n";
}