Linear regression as a basic model
Case 4. Estimating the likely customer check
In this example we use Ridge regression from RubixML. The model is trained on a small dataset, predicts log(check), and then we convert the result back to money with exp().
Example of use
<?php
use Rubix\ML\Datasets\Unlabeled;
require_once __DIR__ . '/code.php';
// Customer features for prediction
// [visits, time_on_site_seconds, pageviews, discount_percent]
$customer = [5, 600, 8, 5];
$unlabeled = new Unlabeled([$customer]);
$logPrice = $model->predict($unlabeled);
$predictedPrice = exp($logPrice[0]);
$weights = $model->coefficients();
$bias = $model->bias();
echo 'Predicted check: ' . round($predictedPrice, 2) . PHP_EOL;
echo 'Predicted log(check): ' . round($logPrice[0], 6) . PHP_EOL . PHP_EOL;
echo 'Coefficients (feature weights):' . PHP_EOL;
echo '0 - visits, 1 - time_on_site_seconds, 2 - pageviews, 3 - discount_percent' . PHP_EOL;
print_r($weights);
echo PHP_EOL . 'Bias (intercept): ' . $bias . PHP_EOL;
Result:
Memory: 1.007 Mb
Time running: 0.014 sec.
Predicted check: 3661.71
Predicted log(check): 8.205686
Coefficients (feature weights):
0 - visits, 1 - time_on_site_seconds, 2 - pageviews, 3 - discount_percent
Array
(
[0] => 0.14071907246965
[1] => -0.00012872304457004
[2] => 0.092300948419521
[3] => -0.085823519417687
)
Bias (intercept): 7.2700340926455
Why predicting log(check) is a useful trick:
- checks often have a heavy tail; log() reduces the impact of rare large purchases;
- in log-space, the model often better matches multiplicative effects (e.g. “twice as many pageviews”);
- exp() guarantees a positive predicted check;
- Ridge adds L2 regularization and helps reduce overfitting on small datasets.
In a real product you would expand the dataset and feature set (traffic source, category interest, purchase history), but even this baseline can be a good starting point.