Logistic regression
Case 3. Spam or not spam
Case Goal:
We will build a simple model that detects whether an email is spam using two basic features:
1) Number of links in the email
2) Email length
This will let you see how logistic regression works not just on a single axis, but in a two-dimensional feature space.
Example of use
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
// Features: [number_of_links, email_length]
$samples = [
[0, 50],
[1, 120],
[5, 300],
[7, 500],
[0, 40],
];
$labels = ['not_spam', 'not_spam', 'spam', 'spam', 'not_spam'];
$dataset = new Labeled($samples, $labels);
$model = new LogisticRegression();
$model->train($dataset);
// Predict for a new email
$prediction = new Unlabeled([[3, 200]]);
$labels = $model->predict($prediction);
echo 'Predicted label: ';
print_r($labels);
// Show probabilities
$probas = $model->proba($prediction);
echo "\nProbabilities (per class): ";
print_r($probas[0]);
Result:
Memory: 1.191 Mb
Time running: 0.015 sec.
Predicted label: Array
(
[0] => not_spam
)
Probabilities (per class): Array
(
[not_spam] => 1
[spam] => 1.1821725907962E-106
)