Linear regression as a basic model

Case 4. Estimating the likely customer check

In e-commerce, marketplaces, and subscription services it is often useful to estimate a customer’s likely purchase amount (“check”) in advance. This helps with personalization, discount strategy, and prioritizing user segments.

Below is a minimal baseline regression model that predicts the likely check from simple on-site behavior features. To keep predictions positive and to make the target distribution more stable, we predict the logarithm of the check and then convert it back to money with exp().

  • $x_1$ — number of visits in the period;
  • $x_2$ — time on site (seconds);
  • $x_3$ — number of pageviews;
  • $x_4$ — discount (percent);

The target variable is the log-check: $y = \log(\text{check})$. The model predicts $\hat{y}$ and we restore the check as $\widehat{\text{check}} = \exp(\hat{y})$.

Example of use:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Regressors\Ridge;

// Features: visits, time_on_site_seconds, pageviews, discount_percent
$samples = [
    [
342050],
    [
1018002010],
    [
112020],
    [
7900125],
];

// Target: log(check_amount)
$labels = [
    
log(3500),
    
log(12000),
    
log(1800),
    
log(7200),
];

$dataset Labeled::build($samples$labels);

// Ridge regression (L2 regularization)
$model = new Ridge(1.0);
$model->train($dataset);