Why Naive Bayes works
Case 3. Numeric features (Gaussian Naive Bayes)
Implementation in pure PHP
Below is the runnable code: we group samples by class, compute mean and variance for each feature inside each class, and then score a new input by adding log priors and log Gaussian likelihoods.
Example of use
<?php
include __DIR__ . '/code.php';
// New numeric sample to classify.
$input = [100, 9];
// Class priors P(class) based on label frequencies.
$classCounts = array_count_values($labels);
$total = count($labels);
// We compute a log-score for each class.
$scores = [];
foreach ($stats as $class => $features) {
// Start with the log prior probability log P(class).
$logProb = log($classCounts[$class] / $total);
foreach ($features as $i => $params) {
$featureIndex = (int)$i;
if (!isset($input[$featureIndex])) {
continue;
}
// Add the log Gaussian likelihood for each feature.
$prob = gaussian($input[$featureIndex], $params['mean'], $params['variance']);
// log(0) protection
$prob = max($prob, 1e-12);
$logProb += log($prob);
}
$scores[$class] = $logProb;
}
// Sort classes by score (highest / closest to 0 is the prediction).
arsort($scores);
print_r($scores);
Result:
Memory: 0.01 Mb
Time running: < 0.001 sec.
Array
(
[active] => -18.640462159403
[inactive] => -55.955189412417
)
The output is the log score for each class. The higher the value (closer to 0), the more likely the class. After sorting, the first class is the prediction.
How to read the output
Comparison:
active = -18.64
inactive = -55.955
-18.64 > -55.955 → the model selects active