Linear regression as a basic model

Case 2. Predicting a developer’s task completion time

In this example we use Ridge linear regression from RubixML to predict a developer’s task completion time based on several numeric features and at the same time inspect the model weights and bias.

 
<?php

use Rubix\ML\Datasets\Unlabeled;

require_once 
__DIR__ '/code.php';

$newTask = [6430018];
$unlabeled = new Unlabeled([$newTask]);
$predictions $model->predict($unlabeled);

$weights $model->coefficients();
$bias $model->bias();

echo 
'Estimated task completion time: ' round($predictions[0], 1) . 'h' PHP_EOL PHP_EOL;

echo 
'Coefficients (feature weights):' PHP_EOL;
echo 
'0 - story_points, 1 - files_changed, 2 - lines_changed, 3 - developer_experience' PHP_EOL;
print_r($weights);

echo 
PHP_EOL 'Bias (intercept): ' $bias PHP_EOL;
Result: Memory: 1.007 Mb Time running: 0.01 sec.
Estimated task completion time: 8.5h

Coefficients (feature weights):
0 - story_points, 1 - files_changed, 2 - lines_changed, 3 - developer_experience
Array
(
    [0] => 0.010057024657726
    [1] => 0.013309612870216
    [2] => 0.014949715230614
    [3] => -0.079857930541039
)

Bias (intercept): 5.3364334106445

Now the model is explainable (a weight is a coefficient):

  • the weight at story points shows how many hours on average one additional SP adds;
  • the weight at the number of files reflects context-switching overhead;
  • the weight at the number of lines often correlates with the amount of manual work;
  • a negative weight at developer experience is expected and logical (the more experience, the less time is usually needed for the same task, so the relationship is inverse and the weight becomes negative);

This kind of model can already be discussed with the team, and the feature set can be adjusted consciously.