Data Cleaning with PHP

Data Standardization with Rubix

If standardization is more appropriate (for instance, if we’re using algorithms like SVMs that are sensitive to variance), we can apply the ZScaleStandardizer. The ZScaleStandardizer adjusts the features to have a mean of 0 and a standard deviation of 1, which is ideal for models like Support Vector Machines (SVM) and Principal Component Analysis (PCA).

[100, 500, 25],
[150, 300, 15],
[200, 400, 20],
[50, 200, 10]

Example of use:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Transformers\MinMaxNormalizer;
use 
Rubix\ML\Transformers\ZScaleStandardizer;

// Create a sample dataset with some numerical features
$samples = [
    [
10050025],
    [
15030015],
    [
20040020],
    [
5020010],
];

$labels = ['A''B''C''D'];

// Create a labeled dataset
$dataset = new Labeled($samples$labels);

// Apply standardization
$standardizer = new ZScaleStandardizer();
$dataset->apply($standardizer);

echo 
"After Standardization: \n";
echo 
"---------------\n";
print_r($dataset->samples());