Linear regression as a basic model
Case 3. Predicting server resource consumption
In production, it is often not enough to know only the response time — we also care whether the infrastructure will handle the load: how much CPU usage will grow, whether we will hit memory limits, whether the database will start choking. Linear regression lets us build a quick, interpretable model that predicts server load from a few simple traffic metrics.
Suppose we have historical monitoring data. For each time window we know a feature vector $\mathbf{x} = (x_1, x_2, x_3, x_4, x_5)$, where:
- $x_1$ — requests per minute;
- $x_2$ — average response size (KB);
- $x_3$ — number of active users on the site;
- $x_4$ — number of background cron jobs in the interval;
- $x_5$ — hour of day (for example, from 0 to 23);
The target variable is CPU load in percent. Linear regression in this case defines a model $ŷ = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 + w_5 x_5$, where the coefficients $w_1, \dots, w_5$ and intercept $w_0$ are fitted using historical monitoring data.
Example of use:
<?php
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Regressors\Ridge;
// Training samples: each row is an observation window
// [requests_per_min, avg_response_size_kb, active_users, cron_jobs, hour]
$samples = [
[1200, 15, 300, 15, 14],
[800, 10, 200, 9, 2],
[1500, 18, 450, 20, 19],
[400, 8, 120, 5, 4],
];
// Target values: CPU load in percent for each window above
$targets = [
75.0,
40.0,
82.0,
25.0,
];
// Build a labeled dataset (features X + targets y)
$dataset = Labeled::build($samples, $targets);
// Simple linear regression (ordinary least squares)
// With alpha = 1e-6, Ridge regression is equivalent to linear regression
$model = new Ridge(1e-6);
// Train the model
$model->train($dataset);