Linear regression as a basic model
Case 2. Predicting a developer’s task completion time
In this example we use Ridge linear regression from RubixML to predict a developer’s task completion time based on several numeric features and at the same time inspect the model weights and bias.
Example of use
<?php
use Rubix\ML\Datasets\Unlabeled;
require_once __DIR__ . '/code.php';
$newTask = [6, 4, 300, 18];
$unlabeled = new Unlabeled([$newTask]);
$predictions = $model->predict($unlabeled);
$weights = $model->coefficients();
$bias = $model->bias();
echo 'Estimated task completion time: ' . round($predictions[0], 1) . 'h' . PHP_EOL . PHP_EOL;
echo 'Coefficients (feature weights):' . PHP_EOL;
echo '0 - story_points, 1 - files_changed, 2 - lines_changed, 3 - developer_experience' . PHP_EOL;
print_r($weights);
echo PHP_EOL . 'Bias (intercept): ' . $bias . PHP_EOL;
Result:
Memory: 1.007 Mb
Time running: 0.01 sec.
Estimated task completion time: 8.5h
Coefficients (feature weights):
0 - story_points, 1 - files_changed, 2 - lines_changed, 3 - developer_experience
Array
(
[0] => 0.010057024657726
[1] => 0.013309612870216
[2] => 0.014949715230614
[3] => -0.079857930541039
)
Bias (intercept): 5.3364334106445
Now the model is explainable (a weight is a coefficient):
- the weight at story points shows how many hours on average one additional SP adds;
- the weight at the number of files reflects context-switching overhead;
- the weight at the number of lines often correlates with the amount of manual work;
- a negative weight at developer experience is expected and logical (the more experience, the less time is usually needed for the same task, so the relationship is inverse and the weight becomes negative);
This kind of model can already be discussed with the team, and the feature set can be adjusted consciously.