Linear regression as a basic model
Case 5. Predicting market salary
Estimating a market salary is a classic regression task: given a candidate’s features (experience, tech stack, company size, remote work), we want to output a number – the expected salary.
Below is a minimal RubixML baseline: we train Ridge regression on a tiny dataset and then predict salary for a new candidate. This demonstrates the mechanics, not a "real" market model.
- $x_1$ – experience (years);
- $x_2$ – technologies score (a simplified numeric score for the tech stack);
- $x_3$ – company size (level: 1 = small, 2 = medium, 3 = enterprise);
- $x_4$ – remote work (1 = yes, 0 = no);
Linear regression model: $\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4$, where $\hat{y}$ is the salary prediction.
Example of code:
<?php
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Regressors\Ridge;
// Features: experience_years, technologies_score, company_size_level, remote
$samples = [
[1, 2, 1, 1],
[3, 4, 2, 1],
[5, 6, 3, 0],
[7, 8, 3, 1],
[10, 10, 3, 1],
];
// Target: salary_usd
$labels = [
1500,
2800,
4500,
6200,
8000,
];
$dataset = Labeled::build($samples, $labels);
$model = new Ridge(1.0);
$model->train($dataset);