Why Naive Bayes works
Case 1. Categorical features and frequencies
Implementation in pure PHP
In this example we implement Naive Bayes by hand for a pair of simple categorical features and see what log-probability scores we get for each class.
Example of use
<?php
include __DIR__ . '/code.php';
// Classify a new sample using Naive Bayes.
$input = ['from_ads' => true, 'has_account' => true];
// We will store a score for each class (log-probability).
$scores = [];
foreach ($classCounts as $class => $count) {
// Start with the log prior: log P(class).
$logProb = log($count / count($data));
foreach ($input as $feature => $value) {
// Booleans become 0/1 keys in PHP arrays.
$valueKey = (int)$value;
// Count how often feature=value occurs in this class.
$featureCount = $featureCounts[$class][$feature][$valueKey] ?? 0;
$total = $classCounts[$class];
// Add log P(feature=value | class) using Laplace smoothing:
// (count + 1) / (total + K), where K is number of possible values.
// Here K=2 because the feature is boolean.
$logProb += log(($featureCount + 1) / ($total + 2));
}
// Final score for this class.
$scores[$class] = $logProb;
}
// Highest score (closest to 0) corresponds to the most likely class.
// The first array key after sorting is the predicted class.
arsort($scores);
// Print raw scores for inspection.
print_r($scores);
Result:
Memory: 0.007 Mb
Time running: < 0.001 sec.
Array
(
[buyer] => -1.6739764335717
[browser] => -2.3671236141316
)
The output is shown as logarithms of the probability scores for each class. The larger the value (i.e. closer to 0), the more likely the class. We sum log-probabilities per feature and apply Laplace smoothing, so after sorting the first class is the predicted one.