Logistic regression

Case 1. Logistic regression for customer churn


Implementation in pure PHP

Customer churn is one of the most typical binary classification tasks. A user either stays in the product or leaves. What we really care about is not just the final "yes or no", but the probability of churn: how high the risk is and whether we should react. This is exactly where logistic regression fits especially well. We will build a simple model in pure PHP that estimates the churn probability from user behavior and see how this probability is formed from features.

 
<?php

// Sigmoid activation: maps any real value to (0, 1),
// which we interpret as a probability of churn.
function sigmoid(float $z): float {
    return 
1.0 / (1.0 exp(-$z));
}

// Dot product between two equal‑length vectors.
function dot(array $a, array $b): float {
    
$sum 0.0;

    foreach (
$a as $i => $v) {
        
$sum += $v $b[$i];
    }

    return 
$sum;
}

// Toy dataset: each row is a user described by 3 numeric features.
// For example, they might represent simple behavioral or profile metrics.
$X = [
    [
1.05.230.0],
    [
12.015.1400.0],
    [
3.04.860.0],
    [
20.025.0800.0],
];

// Binary labels: 1 – user churned, 0 – user stayed.
$y = [1010];

// Model parameters: 3 weights (one per feature) and a scalar bias term.
$weights = [0.00.00.0];
$bias 0.0;

// Training hyperparameters: learning rate and number of epochs.
$learningRate 0.01;
$epochs 1000;

// Stochastic gradient descent over the dataset.
for ($epoch 0$epoch $epochs$epoch++) {
    foreach (
$X as $i => $x) {
        
// Linear score z = w · x + b.
        
$z dot($weights$x) + $bias;
        
// Predicted probability of churn via sigmoid.
        
$p sigmoid($z);

        
// Gradient of log-loss w.r.t. z for this example: (p - y).
        
$error $p $y[$i];

        
// Gradient step for each weight parameter.
        
foreach ($weights as $j => $w) {
            
$weights[$j] -= $learningRate $error $x[$j];
        }

        
// Gradient step for the bias term.
        
$bias -= $learningRate $error;
    }
}

Implementation with RubixML

In real projects, people rarely implement logistic regression from scratch. It is much more convenient to use existing libraries. The very same churn case implemented with RubixML becomes noticeably shorter and closer to how it is often done in production:

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;

// Training samples: each row is a user described by 3 numeric features.
// In a real project these could be behavioral or profile metrics.
$samples = [
    [
15.230],
    [
1215.1400],
    [
34.860],
    [
2025.0800],
];

// Categorical labels for each user:
// 'churn'  — the user left,
// 'stay'   — the user remained active.
$labels = ['churn''stay''churn''stay'];

// Build a labeled dataset from samples and labels.
$dataset = new Labeled($samples$labels);

// Create a logistic regression classifier with default settings
// and train it on the churn dataset.
$classifier = new LogisticRegression();
$classifier->train($dataset);