Data Cleaning with PHP
Handling Missing Values with Rubix
RubixML provides the MissingDataImputer for handling missing values. This imputer allows you to fill in missing values using strategies like Mean, Median, or Constant.
Dataset
age,income,spending_score,tag
25,55000,45,yes
32,?,75,yes
40,72000,?,yes
?,82000,60,yes
28,63000,30,yes
Example of use:
<?php
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Strategies\Percentile;
use Rubix\ML\Transformers\MissingDataImputer;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Strategies\Prior;
// Load the dataset using CSV
$dataset = Labeled::fromIterator(new CSV(dirname(__FILE__) . '/data/customers.csv', true));
// Create imputer with percentile strategy for numeric values and
// Prior (most frequent value) strategy for categorical values
$imputer = new MissingDataImputer(new Percentile(0.55), new Prior());
$dataset->apply($imputer);
echo "\nAfter Imputation:\n";
echo "---------------\n";
foreach ($dataset->samples() as $i => $sample) {
echo implode(',', $sample) . "\n";
}