Data Transformation with PHP
Normalizing and Scaling Numerical Features with Rubix
Normalization adjusts numerical data to a standard range (often $[0, 1]$), which helps with model performance when features are on different scales.
Dataset
2000,300,low
2400,450,medium
3000,500,high
Example of use:
<?php
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Transformers\MinMaxNormalizer;
use Rubix\ML\Transformers\NumericStringConverter;
// Load the CSV data
$extractor = new CSV(dirname(__FILE__) . '/data/numerical.csv');
// Convert strings to numbers and separate features from labels
$samples = [];
$labels = [];
foreach ($extractor as $row) {
$samples[] = [
(int)$row[0],
(int)$row[1],
];
$labels[] = $row[2];
}
// Create the dataset
$dataset = new Labeled($samples, $labels);
$normalizer = new MinMaxNormalizer();
$normalizer->fit($dataset);
$samples = $dataset->samples();
$labels = $dataset->labels();
$normalizer->transform($samples);
echo "\nNormalized data:\n";
echo "---------------\n";
// Print normalized data with labels in CSV format
foreach ($samples as $ind => $sample) {
echo implode(',', $sample) . ',' . $labels[$ind] . "\n";
}