Linear regression as a basic model

Case 3. Predicting server resource consumption

In production, it is often not enough to know only the response time — we also care whether the infrastructure will handle the load: how much CPU usage will grow, whether we will hit memory limits, whether the database will start choking. Linear regression lets us build a quick, interpretable model that predicts server load from a few simple traffic metrics.

Suppose we have historical monitoring data. For each time window we know a feature vector $\mathbf{x} = (x_1, x_2, x_3, x_4, x_5)$, where:

  • $x_1$ — requests per minute;
  • $x_2$ — average response size (KB);
  • $x_3$ — number of active users on the site;
  • $x_4$ — number of background cron jobs in the interval;
  • $x_5$ — hour of day (for example, from 0 to 23);

The target variable is CPU load in percent. Linear regression in this case defines a model $ŷ = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4 + w_5 x_5$, where the coefficients $w_1, \dots, w_5$ and intercept $w_0$ are fitted using historical monitoring data.

Example of use:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Regressors\Ridge;

// Training samples: each row is an observation window
// [requests_per_min, avg_response_size_kb, active_users, cron_jobs, hour]
$samples = [
    [
1200153001514],
    [
8001020092],
    [
1500184502019],
    [
400812054],
];

// Target values: CPU load in percent for each window above
$targets = [
    
75.0,
    
40.0,
    
82.0,
    
25.0,
];

// Build a labeled dataset (features X + targets y)
$dataset Labeled::build($samples$targets);

// Simple linear regression (ordinary least squares)
// With alpha = 1e-6, Ridge regression is equivalent to linear regression
$model = new Ridge(1e-6);

// Train the model
$model->train($dataset);