Logistic regression

Case 2. Newsletter subscription

There is no multidimensional space, no complex features, and no intersecting factors. There is one feature and a binary decision. And this is already enough to see the entire logic of logistic regression.

Case Goal:
Predict whether a user will subscribe to an email newsletter based only on the time spent on the site.

The model should answer two questions:
1) What is the probability of subscription?
2) Where is the boundary between "most likely to subscribe" and "most likely not to subscribe"?

Example of code:

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;

// One feature per user: time spent on the site.
$samples = [
    [
0.5],
    [
1.2],
    [
2.0],
    [
5.0],
    [
7.0],
];

// Labels: subscribed or not.
$labels = ['no''no''no''yes''yes'];

$dataset = new Labeled($samples$labels);

$model = new LogisticRegression();
$model->train($dataset);

// Predict for a new user (3.0 time units).
$prediction = new Unlabeled([[3.0]]);
$labels $model->predict($prediction);

echo 
'Predicted label: ';
print_r($labels);

// Show probabilities
$probas $model->proba($prediction);
echo 
"\nProbabilities (per class): ";
print_r($probas[0]);