Why Naive Bayes works
Case 3. Numeric features (Gaussian Naive Bayes)
Implementation in pure PHP
So far we worked with categorical features: a word is present or not, a feature is 0 or 1. But in real tasks data is often numeric – time, counts, temperature, distance.
In this case we show how Naive Bayes works with numeric features using Gaussian Naive Bayes, where each feature within a class is assumed to be normally distributed. Instead of frequencies, we estimate distribution parameters (mean and variance) and compute likelihoods via the Gaussian PDF. The overall logic stays the same: product of probabilities → sum of log-probabilities.
Example of code
<?php
// Toy numeric dataset: each sample has 2 numeric features.
$samples = [
[120, 10],
[130, 12],
[20, 1],
[30, 2],
];
// Class label for each sample.
$labels = ['active', 'active', 'inactive', 'inactive'];
// Mean and variance estimators (population variance).
function mean(array $values): float {
return array_sum($values) / count($values);
}
function variance(array $values, float $mean): float {
$sum = 0.0;
foreach ($values as $v) {
$sum += pow($v - $mean, 2);
}
return $sum / count($values);
}
// Gaussian probability density function (likelihood for a numeric feature).
function gaussian(float $x, float $mean, float $variance): float {
return (1 / sqrt(2 * pi() * $variance)) * exp(-pow($x - $mean, 2) / (2 * $variance));
}
// Group training samples by class.
$grouped = [];
foreach ($samples as $i => $sample) {
$class = $labels[$i];
$grouped[$class][] = $sample;
}
// Per-class, per-feature statistics: [class][featureIndex] => ['mean' => ..., 'variance' => ...]
$stats = [];
foreach ($grouped as $class => $rows) {
// Transpose rows into columns to get feature-wise arrays of values.
$features = array_map(null, ...$rows);
foreach ($features as $i => $values) {
$m = mean($values);
$v = variance($values, $m);
$stats[$class][$i] = [
'mean' => $m,
'variance' => $v ?: 1e-6,
];
}
}
Implementation with RubixML
Then we solve the same toy task using RubixML GaussianNB: we build a labeled dataset, train the model, and predict the class for a new numeric vector.
Example of code
<?php
use Rubix\ML\Classifiers\GaussianNB;
use Rubix\ML\Datasets\Labeled;
$samples = [
[120, 10],
[130, 12],
[20, 1],
[30, 2],
];
$labels = ['active', 'active', 'inactive', 'inactive'];
$dataset = new Labeled($samples, $labels);
$model = new GaussianNB();
$model->train($dataset);