Linear regression as a basic model

Case 5. Predicting market salary

Estimating a market salary is a classic regression task: given a candidate’s features (experience, tech stack, company size, remote work), we want to output a number – the expected salary.

Below is a minimal RubixML baseline: we train Ridge regression on a tiny dataset and then predict salary for a new candidate. This demonstrates the mechanics, not a "real" market model.

  • $x_1$ – experience (years);
  • $x_2$ – technologies score (a simplified numeric score for the tech stack);
  • $x_3$ – company size (level: 1 = small, 2 = medium, 3 = enterprise);
  • $x_4$ – remote work (1 = yes, 0 = no);

Linear regression model: $\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_3 + w_4 x_4$, where $\hat{y}$ is the salary prediction.

Example of code:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Regressors\Ridge;

// Features: experience_years, technologies_score, company_size_level, remote
$samples = [
    [
1211],
    [
3421],
    [
5630],
    [
7831],
    [
101031],
];

// Target: salary_usd
$labels = [
    
1500,
    
2800,
    
4500,
    
6200,
    
8000,
];

$dataset Labeled::build($samples$labels);

$model = new Ridge(1.0);
$model->train($dataset);