Logistic regression

Case 4. Ad click (CTR)

Ad click prediction is one of the most massive and practical tasks in machine learning. Each banner impression is a tiny decision: will the user click or not.
That is exactly where logistic regression used to be the de facto standard for a long time. It is simple, fast, scales well, and produces a probability right away.

Case Goal
Predict the probability that a user will click an ad using simple behavioral features.

Important: we want not just a "click / no click" label, but the click probability itself – CTR (Click-Through Rate).

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;

// Features: [time_on_page, clicked_before]
// - time_on_page: numeric (e.g. seconds or arbitrary time units)
// - clicked_before: 0/1 flag (whether the user has clicked ads before)
$samples = [
    [
90],
    [
121],
    [
181],
    [
220],
    [
141],
];

// RubixML classifiers expect categorical labels, so we use strings.
$labels = ['no_click''click''click''no_click''click'];

// Supervised dataset: feature vectors + class labels.
$dataset = new Labeled($samples$labels);

// Train logistic regression on the labeled dataset.
$model = new LogisticRegression();
$model->train($dataset);

// Predict for a new user impression.
$sampleToPredict = new Unlabeled([[201]]);
$prediction $model->predict($sampleToPredict);

echo 
'Predicted label: ' PHP_EOL;
print_r($prediction);

$probas $model->proba($sampleToPredict);
// CTR is the probability of the positive class ('click').
$ctr $probas[0]['click'] ?? null;

echo 
PHP_EOL 'Probability of click (CTR): ' $ctr PHP_EOL PHP_EOL;

echo 
'Probabilities (per class): ' PHP_EOL;
print_r($probas[0]);

Result: Memory: 1.191 Mb Time running: 0.006 sec.
Predicted label: 
Array
(
    [0] => click
)

Probability of click (CTR): 0.99999999817311

Probabilities (per class): 
Array
(
    [no_click] => 1.8268941914812E-9
    [click] => 0.99999999817311
)