Data Cleaning with PHP

Handling Missing Values with Rubix

RubixML provides the MissingDataImputer for handling missing values. This imputer allows you to fill in missing values using strategies like Mean, Median, or Constant.

age,income,spending_score,tag
25,55000,45,yes
32,?,75,yes
40,72000,?,yes
?,82000,60,yes
28,63000,30,yes

Example of use:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Strategies\Percentile;
use 
Rubix\ML\Transformers\MissingDataImputer;
use 
Rubix\ML\Extractors\CSV;
use 
Rubix\ML\Strategies\Prior;

// Load the dataset using CSV
$dataset Labeled::fromIterator(new CSV(dirname(__FILE__) . '/data/customers.csv'true));

// Create imputer with percentile strategy for numeric values and
// Prior (most frequent value) strategy for categorical values
$imputer = new MissingDataImputer(new Percentile(0.55), new Prior());

$dataset->apply($imputer);

echo 
"\nAfter Imputation:\n";
echo 
"---------------\n";
foreach (
$dataset->samples() as $i => $sample) {
    echo 
implode(','$sample) . "\n";
}