Probability as degree of confidence
Case 3. Multiclass classification and softmax
When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.
Scenario
We have a model that must distinguish three types of emails: normal, promo, and spam. Each email uses two simple features: subject length and number of links.
Unlike binary classification, here the model must distribute confidence across several classes at once. That is exactly why SoftmaxClassifier is useful.
Example of code:
<?php
use Rubix\ML\Classifiers\SoftmaxClassifier;
use Rubix\ML\Datasets\Labeled;
// Email features: [subject length, number of links]
$samples = [
[4, 0],
[6, 1],
[10, 3],
[12, 4],
[15, 7],
[18, 9],
];
// Class labels for the training samples
$labels = [
'normal',
'normal',
'promo',
'promo',
'spam',
'spam',
];
// Build a labeled dataset from features and labels
$dataset = new Labeled($samples, $labels);
// Softmax produces a probability distribution over classes
$model = new SoftmaxClassifier();
// Train the model on the prepared dataset
$model->train($dataset);