Logistic regression
Case 4. Ad click (CTR)
Ad click prediction is one of the most massive and practical tasks in machine learning. Each banner impression is a tiny decision: will the user click or not.
That is exactly where logistic regression used to be the de facto standard for a long time. It is simple, fast, scales well, and produces a probability right away.
Case Goal
Predict the probability that a user will click an ad using simple behavioral features.
Important: we want not just a "click / no click" label, but the click probability itself – CTR (Click-Through Rate).
Example of use
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
// Features: [time_on_page, clicked_before]
// - time_on_page: numeric (e.g. seconds or arbitrary time units)
// - clicked_before: 0/1 flag (whether the user has clicked ads before)
$samples = [
[9, 0],
[12, 1],
[18, 1],
[22, 0],
[14, 1],
];
// RubixML classifiers expect categorical labels, so we use strings.
$labels = ['no_click', 'click', 'click', 'no_click', 'click'];
// Supervised dataset: feature vectors + class labels.
$dataset = new Labeled($samples, $labels);
// Train logistic regression on the labeled dataset.
$model = new LogisticRegression();
$model->train($dataset);
// Predict for a new user impression.
$sampleToPredict = new Unlabeled([[20, 1]]);
$prediction = $model->predict($sampleToPredict);
echo 'Predicted label: ' . PHP_EOL;
print_r($prediction);
$probas = $model->proba($sampleToPredict);
// CTR is the probability of the positive class ('click').
$ctr = $probas[0]['click'] ?? null;
echo PHP_EOL . 'Probability of click (CTR): ' . $ctr . PHP_EOL . PHP_EOL;
echo 'Probabilities (per class): ' . PHP_EOL;
print_r($probas[0]);
Result:
Memory: 1.191 Mb
Time running: 0.006 sec.
Predicted label:
Array
(
[0] => click
)
Probability of click (CTR): 0.99999999817311
Probabilities (per class):
Array
(
[no_click] => 1.8268941914812E-9
[click] => 0.99999999817311
)