Linear regression as a basic model

Case 5. Predicting market salary

We use Ridge regression (linear regression with L2 regularization): it is more stable on small datasets and helps reduce overfitting.

Example of use


                
<?php


use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Regressors\Ridge;

// Features: experience_years, technologies_score, company_size_level, remote
$samples = [
    [1, 2, 1, 1],
    [3, 4, 2, 1],
    [5, 6, 3, 0],
    [7, 8, 3, 1],
    [10, 10, 3, 1],
];

// Target: salary_usd
$labels = [
    1500,
    2800,
    4500,
    6200,
    8000,
];

$dataset = Labeled::build($samples, $labels);

$model = new Ridge(1.0);
$model->train($dataset);

// Candidate features for prediction
// [experience_years, technologies_score, company_size_level, remote]
$candidate = [4, 5, 2, 1];

$unlabeled = new Unlabeled([$candidate]);
$prediction = $model->predict($unlabeled);

$salary = $prediction[0];

$weights = array_map(function ($weight) {
    return number_format($weight, 2);
}, $model->coefficients());
$bias = number_format($model->bias(), 2);

echo 'Expected salary: ' . round($salary, 2) . PHP_EOL . PHP_EOL;

echo 'Coefficients (feature weights):' . PHP_EOL;
echo '0 - experience_years, 1 - technologies_score, 2 - company_size_level, 3 - remote' . PHP_EOL;
print_r($weights);

echo PHP_EOL . 'Bias (intercept): ' . $bias . PHP_EOL;

Result: Memory: 0.005 Mb Time running: 0.007 sec.


                Expected salary: 3750

Coefficients (feature weights):
0 - experience_years, 1 - technologies_score, 2 - company_size_level, 3 - remote
Array
(
    [0] => 387.50
    [1] => 375.00
    [2] => 37.50
    [3] => 25.00
)

Bias (intercept): 225.00

How to interpret the output (weights and bias):

the weight for experience shows how much salary increases on average per additional year;
the weight for the technologies score reflects the premium for a more in-demand stack;
the weight for company size often correlates with compensation bands;
the weight for remote work may capture a market adjustment (but in practice it heavily depends on region and company policy).

In a real task you would add much more data, normalize features, include categorical signals (region, level, industry), and evaluate the model on a test set.