Logistic regression

Case 3. Spam or not spam

Case Goal:
We will build a simple model that detects whether an email is spam using two basic features:

1) Number of links in the email
2) Email length

This will let you see how logistic regression works not just on a single axis, but in a two-dimensional feature space.

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;

// Features: [number_of_links, email_length]
$samples = [
    [
050],
    [
1120],
    [
5300],
    [
7500],
    [
040],
];

$labels = ['not_spam''not_spam''spam''spam''not_spam'];

$dataset = new Labeled($samples$labels);

$model = new LogisticRegression();
$model->train($dataset);

// Predict for a new email
$prediction = new Unlabeled([[3200]]);
$labels $model->predict($prediction);

echo 
'Predicted label: ';
print_r($labels);

// Show probabilities
$probas $model->proba($prediction);
echo 
"\nProbabilities (per class): ";
print_r($probas[0]);

Result: Memory: 1.191 Mb Time running: 0.015 sec.
Predicted label: Array
(
    [0] => not_spam
)

Probabilities (per class): Array
(
    [not_spam] => 1
    [spam] => 1.1821725907962E-106
)