Probability as degree of confidence
Case 1. Spam filter: probability ≠ decision
When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.
Example of use:
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
// Each sample represents an email.
// Features: [subject length (tokens), number of links in the message body]
$samples = [
[3, 1], // short subject, few links
[15, 8], // long subject, many links
[5, 0], // medium subject, no links
];
// Class labels for each email: inbox mail vs spam
$labels = ['normal', 'spam', 'normal'];
// Supervised training dataset: feature vectors + class labels
$dataset = new Labeled($samples, $labels);
// Logistic regression classifier from RubixML
$model = new LogisticRegression();
// Train the classifier on the labeled dataset
$model->train($dataset);
// New incoming email we want to classify
// Same feature schema: [subject length, number of links]
$sample = new Unlabeled([[12, 6]]);
// Predicted probability for each class (e.g. ['normal' => 0.32, 'spam' => 0.68])
$probabilities = $model->proba($sample)[0];