Probability as degree of confidence

Case 3. Multiclass classification and softmax

When developers hear the word "probability", they often imagine dice, coin flips, and the school formula "favorable outcomes divided by all possible outcomes". This is useful, but a very narrow picture. In machine learning and applied analytics, probability almost always means something else – the degree of our confidence in a statement given the available data.

Scenario

We have a model that must distinguish three types of emails: normal, promo, and spam. Each email uses two simple features: subject length and number of links.

Unlike binary classification, here the model must distribute confidence across several classes at once. That is exactly why SoftmaxClassifier is useful.

Example of code:

 
<?php

use Rubix\ML\Classifiers\SoftmaxClassifier;
use 
Rubix\ML\Datasets\Labeled;

// Email features: [subject length, number of links]
$samples = [
    [
40],
    [
61],
    [
103],
    [
124],
    [
157],
    [
189],
];

// Class labels for the training samples
$labels = [
    
'normal',
    
'normal',
    
'promo',
    
'promo',
    
'spam',
    
'spam',
];

// Build a labeled dataset from features and labels
$dataset = new Labeled($samples$labels);

// Softmax produces a probability distribution over classes
$model = new SoftmaxClassifier();
// Train the model on the prepared dataset
$model->train($dataset);