Why Naive Bayes works

Case 3. Numeric features (Gaussian Naive Bayes)


Implementation in pure PHP

Below is the runnable code: we group samples by class, compute mean and variance for each feature inside each class, and then score a new input by adding log priors and log Gaussian likelihoods.

 
<?php

include __DIR__ '/code.php';

// New numeric sample to classify.
$input = [1009];

// Class priors P(class) based on label frequencies.
$classCounts array_count_values($labels);
$total count($labels);

// We compute a log-score for each class.
$scores = [];

foreach (
$stats as $class => $features) {
    
// Start with the log prior probability log P(class).
    
$logProb log($classCounts[$class] / $total);

    foreach (
$features as $i => $params) {
        
$featureIndex = (int)$i;

        if (!isset(
$input[$featureIndex])) {
            continue;
        }

        
// Add the log Gaussian likelihood for each feature.
        
$prob gaussian($input[$featureIndex], $params['mean'], $params['variance']);
        
// log(0) protection
        
$prob max($prob1e-12);
        
$logProb += log($prob);
    }

    
$scores[$class] = $logProb;
}

// Sort classes by score (highest / closest to 0 is the prediction).
arsort($scores);
print_r($scores);

Result: Memory: 0.01 Mb Time running: < 0.001 sec.
Array
(
    [active] => -18.640462159403
    [inactive] => -55.955189412417
)

The output is the log score for each class. The higher the value (closer to 0), the more likely the class. After sorting, the first class is the prediction.

How to read the output
Comparison:

active = -18.64
inactive = -55.955

-18.64 > -55.955 → the model selects active