Why Naive Bayes works

MNIST: probabilistic digit classification (Naive Bayes)


MNIST case with RubixML GaussianNB

Run the RubixML version to train GaussianNB on the same toy samples and predict a new normalized image.

 
<?php

use app\classes\MnistLoader;
use 
Rubix\ML\Classifiers\GaussianNB;
use 
Rubix\ML\CrossValidation\Metrics\Accuracy;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;
use 
Rubix\ML\Extractors\CSV;

try {
    
// Build the training and test datasets from the filtered CSV rows.
    
$trainRows MnistLoader::loadIterable('train.csv'categoricalLabelstruenormalizetruedigits: [01]);
    
$testRows MnistLoader::loadIterable('test.csv'categoricalLabelstruenormalizetruedigits: [01]);

    
$dataset Labeled::fromIterator($trainRows);
    
$testDataset Labeled::fromIterator($testRows);
} catch (
Exception $e) {
    echo 
'<div class="alert alert-danger" role="alert">' htmlspecialchars($e->getMessage(), ENT_QUOTES'UTF-8') . '</div>';
    exit;
}

$model = new GaussianNB();
$model->train($dataset);

$predictions = [];
$testingLabels $testDataset->labels();

foreach (
$testDataset->samples() as $i => $x) {
    
$prediction $model->predict(new Unlabeled([$x]))[0];
    
$predictions[] = $prediction;
}

$metric = new Accuracy();
$score $metric->score($predictions$testingLabels);

echo 
'Train samples handled: ' number_format($dataset->numSamples()) . PHP_EOL;
echo 
'Test samples handled: ' number_format($testDataset->numSamples()) . PHP_EOL PHP_EOL;
echo 
'Accuracy: ' round($score 1002) . '%';
Sample of digit: 0
Predicted digit: 0
Sample of digit: 1
Predicted digit: 1
Result: Memory: 0 Mb Time running: < 0.001 sec.
Train samples handled: 12,666
Test samples handled: 2,116

Accuracy: 99.15%

RubixML returns the predicted class label for the new sample. Because the data and assumption are the same, the result should almost identical to the pure PHP implementation.