Logistic regression

Case 4. Ad click (CTR)

Ad click prediction is one of the most massive and practical tasks in machine learning. Each banner impression is a tiny decision: will the user click or not.
That is exactly where logistic regression used to be the de facto standard for a long time. It is simple, fast, scales well, and produces a probability right away.

Case Goal
Predict the probability that a user will click an ad using simple behavioral features.

Important: we want not just a "click / no click" label, but the click probability itself – CTR (Click-Through Rate).

Example of code:

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;

// Features: [time_on_page, clicked_before]
// - time_on_page: numeric (e.g. seconds or arbitrary time units)
// - clicked_before: 0/1 flag (whether the user has clicked ads before)
$samples = [
    [
90],
    [
121],
    [
181],
    [
220],
    [
141],
];

// RubixML classifiers expect categorical labels, so we use strings.
$labels = ['no_click''click''click''no_click''click'];

// Supervised dataset: feature vectors + class labels.
$dataset = new Labeled($samples$labels);

// Train logistic regression on the labeled dataset.
$model = new LogisticRegression();
$model->train($dataset);