Why Naive Bayes works

Case 1. Categorical features and frequencies


Implementation in pure PHP

In this case we look at the simplest Naive Bayes with categorical features. We will explicitly count per-class frequencies, convert them into probabilities, and classify a new user.

 
<?php

// Toy training dataset: each row is a sample with two boolean features and a class label.
$data = [
    [
'class' => 'buyer',   'from_ads' => true,  'has_account' => true],
    [
'class' => 'buyer',   'from_ads' => false'has_account' => true],
    [
'class' => 'browser''from_ads' => true,  'has_account' => false],
    [
'class' => 'browser''from_ads' => true,  'has_account' => false],
];

// Count priors (class frequencies) and conditional feature frequencies per class.
// These counts will be used later to compute P(class) and P(feature=value | class).
$classCounts = [];
$featureCounts = [];

foreach (
$data as $row) {
    
// Count how many samples belong to each class.
    
$class $row['class'];
    
$classCounts[$class] = ($classCounts[$class] ?? 0) + 1;

    foreach (
$row as $feature => $value) {
        if (
$feature === 'class') {
            continue;
        }

        
// Count how often each feature value occurs within each class.
        // Note: in PHP array keys, booleans are cast to 0/1.
        
$featureCounts[$class][$feature][$value] = ($featureCounts[$class][$feature][$value] ?? 0) + 1;
    }
}

Implementation with RubixML

Then we implement the same example using RubixML and see how the same logic is expressed at the library level.

 
<?php

use Rubix\ML\Classifiers\NaiveBayes;
use 
Rubix\ML\Datasets\Labeled;

// Training samples for RubixML NaiveBayes.
// Important: Rubix\ML\Classifiers\NaiveBayes works with categorical (discrete) features.
$samples = [
    [
'from_ads''has_account'],
    [
'organic_search''has_account'],
    [
'from_ads''no_account'],
    [
'from_ads''no_account'],
];

// Class labels for each sample.
$labels = ['buyer''buyer''browser''browser'];

// Build the labeled dataset.
$dataset = new Labeled($samples$labels);

// Train Naive Bayes model.
$model = new NaiveBayes();
$model->train($dataset);