Why Naive Bayes works

Case 3. Numeric features (Gaussian Naive Bayes)


Implementation in pure PHP

So far we worked with categorical features: a word is present or not, a feature is 0 or 1. But in real tasks data is often numeric – time, counts, temperature, distance.

In this case we show how Naive Bayes works with numeric features using Gaussian Naive Bayes, where each feature within a class is assumed to be normally distributed. Instead of frequencies, we estimate distribution parameters (mean and variance) and compute likelihoods via the Gaussian PDF. The overall logic stays the same: product of probabilities → sum of log-probabilities.

 
<?php

// Toy numeric dataset: each sample has 2 numeric features.
$samples = [
    [
12010],
    [
13012],
    [
201],
    [
302],
];

// Class label for each sample.
$labels = ['active''active''inactive''inactive'];

// Mean and variance estimators (population variance).
function mean(array $values): float {
    return 
array_sum($values) / count($values);
}

function 
variance(array $valuesfloat $mean): float {
    
$sum 0.0;

    foreach (
$values as $v) {
        
$sum += pow($v $mean2);
    }

    return 
$sum count($values);
}

// Gaussian probability density function (likelihood for a numeric feature).
function gaussian(float $xfloat $meanfloat $variance): float {
    return (
sqrt(pi() * $variance)) * exp(-pow($x $mean2) / ($variance));
}

// Group training samples by class.
$grouped = [];

foreach (
$samples as $i => $sample) {
    
$class $labels[$i];
    
$grouped[$class][] = $sample;
}

// Per-class, per-feature statistics: [class][featureIndex] => ['mean' => ..., 'variance' => ...]
$stats = [];

foreach (
$grouped as $class => $rows) {
    
// Transpose rows into columns to get feature-wise arrays of values.
    
$features array_map(null, ...$rows);

    foreach (
$features as $i => $values) {
        
$m mean($values);
        
$v variance($values$m);

        
$stats[$class][$i] = [
            
'mean' => $m,
            
'variance' => $v ?: 1e-6,
        ];
    }
}

Implementation with RubixML

Then we solve the same toy task using RubixML GaussianNB: we build a labeled dataset, train the model, and predict the class for a new numeric vector.

 
<?php

use Rubix\ML\Classifiers\GaussianNB;
use 
Rubix\ML\Datasets\Labeled;

$samples = [
    [
12010],
    [
13012],
    [
20,  1],
    [
30,  2],
];

$labels = ['active''active''inactive''inactive'];

$dataset = new Labeled($samples$labels);

$model = new GaussianNB();
$model->train($dataset);