Logistic regression

Case 3. Spam or not spam

Case Goal:
We will build a simple model that detects whether an email is spam using two basic features:

1) Number of links in the email
2) Email length

This will let you see how logistic regression works not just on a single axis, but in a two-dimensional feature space.

Example of use


                
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;

// Features: [number_of_links, email_length]
$samples = [
    [0, 50],
    [1, 120],
    [5, 300],
    [7, 500],
    [0, 40],
];

$labels = ['not_spam', 'not_spam', 'spam', 'spam', 'not_spam'];

$dataset = new Labeled($samples, $labels);

$model = new LogisticRegression();
$model->train($dataset);

// Predict for a new email
$prediction = new Unlabeled([[3, 200]]);
$labels = $model->predict($prediction);

echo 'Predicted label: ';
print_r($labels);

// Show probabilities
$probas = $model->proba($prediction);
echo "\nProbabilities (per class): ";
print_r($probas[0]);

Result: Memory: 1.191 Mb Time running: 0.023 sec.


                Predicted label: Array
(
    [0] => spam
)

Probabilities (per class): Array
(
    [not_spam] => 0
    [spam] => 1
)