Logistic regression
Case 4. Ad click (CTR)
Ad click prediction is one of the most massive and practical tasks in machine learning. Each banner impression is a tiny decision: will the user click or not.
That is exactly where logistic regression used to be the de facto standard for a long time. It is simple, fast, scales well, and produces a probability right away.
Case Goal
Predict the probability that a user will click an ad using simple behavioral features.
Important: we want not just a "click / no click" label, but the click probability itself – CTR (Click-Through Rate).
Example of code:
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
// Features: [time_on_page, clicked_before]
// - time_on_page: numeric (e.g. seconds or arbitrary time units)
// - clicked_before: 0/1 flag (whether the user has clicked ads before)
$samples = [
[9, 0],
[12, 1],
[18, 1],
[22, 0],
[14, 1],
];
// RubixML classifiers expect categorical labels, so we use strings.
$labels = ['no_click', 'click', 'click', 'no_click', 'click'];
// Supervised dataset: feature vectors + class labels.
$dataset = new Labeled($samples, $labels);
// Train logistic regression on the labeled dataset.
$model = new LogisticRegression();
$model->train($dataset);