Logistic regression
Case 2. Newsletter subscription
There is no multidimensional space, no complex features, and no intersecting factors. There is one feature and a binary decision. And this is already enough to see the entire logic of logistic regression.
Case Goal:
Predict whether a user will subscribe to an email newsletter based only on the time spent on the site.
The model should answer two questions:
1) What is the probability of subscription?
2) Where is the boundary between "most likely to subscribe" and "most likely not to subscribe"?
Example of code:
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
// One feature per user: time spent on the site.
$samples = [
[0.5],
[1.2],
[2.0],
[5.0],
[7.0],
];
// Labels: subscribed or not.
$labels = ['no', 'no', 'no', 'yes', 'yes'];
$dataset = new Labeled($samples, $labels);
$model = new LogisticRegression();
$model->train($dataset);
// Predict for a new user (3.0 time units).
$prediction = new Unlabeled([[3.0]]);
$labels = $model->predict($prediction);
echo 'Predicted label: ';
print_r($labels);
// Show probabilities
$probas = $model->proba($prediction);
echo "\nProbabilities (per class): ";
print_r($probas[0]);