Why Naive Bayes works
Case 1. Categorical features and frequencies
Implementation in pure PHP
In this case we look at the simplest Naive Bayes with categorical features. We will explicitly count per-class frequencies, convert them into probabilities, and classify a new user.
Example of use
<?php
// Toy training dataset: each row is a sample with two boolean features and a class label.
$data = [
['class' => 'buyer', 'from_ads' => true, 'has_account' => true],
['class' => 'buyer', 'from_ads' => false, 'has_account' => true],
['class' => 'browser', 'from_ads' => true, 'has_account' => false],
['class' => 'browser', 'from_ads' => true, 'has_account' => false],
];
// Count priors (class frequencies) and conditional feature frequencies per class.
// These counts will be used later to compute P(class) and P(feature=value | class).
$classCounts = [];
$featureCounts = [];
foreach ($data as $row) {
// Count how many samples belong to each class.
$class = $row['class'];
$classCounts[$class] = ($classCounts[$class] ?? 0) + 1;
foreach ($row as $feature => $value) {
if ($feature === 'class') {
continue;
}
// Count how often each feature value occurs within each class.
// Note: in PHP array keys, booleans are cast to 0/1.
$featureCounts[$class][$feature][$value] = ($featureCounts[$class][$feature][$value] ?? 0) + 1;
}
}
Implementation with RubixML
Then we implement the same example using RubixML and see how the same logic is expressed at the library level.
Example of use
<?php
use Rubix\ML\Classifiers\NaiveBayes;
use Rubix\ML\Datasets\Labeled;
// Training samples for RubixML NaiveBayes.
// Important: Rubix\ML\Classifiers\NaiveBayes works with categorical (discrete) features.
$samples = [
['from_ads', 'has_account'],
['organic_search', 'has_account'],
['from_ads', 'no_account'],
['from_ads', 'no_account'],
];
// Class labels for each sample.
$labels = ['buyer', 'buyer', 'browser', 'browser'];
// Build the labeled dataset.
$dataset = new Labeled($samples, $labels);
// Train Naive Bayes model.
$model = new NaiveBayes();
$model->train($dataset);