Linear regression as a basic model

Case 4. Estimating the likely customer check

In this example we use Ridge regression from RubixML. The model is trained on a small dataset, predicts log(check), and then we convert the result back to money with exp().

 
<?php

use Rubix\ML\Datasets\Unlabeled;

require_once 
__DIR__ '/code.php';

// Customer features for prediction
// [visits, time_on_site_seconds, pageviews, discount_percent]
$customer = [560085];

$unlabeled = new Unlabeled([$customer]);
$logPrice $model->predict($unlabeled);
$predictedPrice exp($logPrice[0]);

$weights $model->coefficients();
$bias $model->bias();

echo 
'Predicted check: ' round($predictedPrice2) . PHP_EOL;
echo 
'Predicted log(check): ' round($logPrice[0], 6) . PHP_EOL PHP_EOL;

echo 
'Coefficients (feature weights):' PHP_EOL;
echo 
'0 - visits, 1 - time_on_site_seconds, 2 - pageviews, 3 - discount_percent' PHP_EOL;
print_r($weights);

echo 
PHP_EOL 'Bias (intercept): ' $bias PHP_EOL;
Result: Memory: 1.007 Mb Time running: 0.014 sec.
Predicted check: 3661.71
Predicted log(check): 8.205686

Coefficients (feature weights):
0 - visits, 1 - time_on_site_seconds, 2 - pageviews, 3 - discount_percent
Array
(
    [0] => 0.14071907246965
    [1] => -0.00012872304457004
    [2] => 0.092300948419521
    [3] => -0.085823519417687
)

Bias (intercept): 7.2700340926455

Why predicting log(check) is a useful trick:

  • checks often have a heavy tail; log() reduces the impact of rare large purchases;
  • in log-space, the model often better matches multiplicative effects (e.g. “twice as many pageviews”);
  • exp() guarantees a positive predicted check;
  • Ridge adds L2 regularization and helps reduce overfitting on small datasets.

In a real product you would expand the dataset and feature set (traffic source, category interest, purchase history), but even this baseline can be a good starting point.